Get summary video from a text search
Get summary video from a text search
GPT2MVS: Generative Pre-trained Transformer-2 for Multi-modal Video Summarization
arXiv paper abstract https://arxiv.org/abs/2104.12465v1
arXiv PDF paper https://arxiv.org/pdf/2104.12465v1.pdf
... a text-based query is considered as one of the main drivers of video summary generation, as it is user-defined. ... The proposed model consists of a contextualized video summary controller, multi-modal attention mechanisms, an interactive attention network, and a video summary generator. Based on the evaluation of the existing multi-modal video summarization benchmark, experimental results show that the proposed model is effective with the increase of +5.88% in accuracy and +4.06% increase of F1-score, compared with the state-of-the-art method.
Please like and share this post if you enjoyed it using the buttons at the bottom!
Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website
Comments