Survey of video captioning many events in a scene

morrislee
Nov 7, 2023
1 min read

Dense Video Captioning: A Survey of Techniques, Datasets and Evaluation Protocols

arXiv paper abstract https://arxiv.org/abs/2311.02538

arXiv PDF paper https://arxiv.org/pdf/2311.02538.pdf

... videos have interrelated events, dependencies, context, overlapping events, object-object interactions, domain specificity, and other semantics that are worth ... describing ... in natural language.

Owing to such a vast diversity, a single sentence can only correctly describe a portion of the video.

Dense Video Captioning (DVC) aims at detecting and describing different events in a given video.

... Dense Video Captioning is divided into three sub-tasks: (1) Video Feature Extraction (VFE), (2) Temporal Event Localization (TEL), and (3) Dense Caption Generation (DCG).

This review aims to discuss all the studies that claim to perform DVC along with its sub-tasks and summarize their results.

... also discuss all the datasets that have been used for DVC ...

Please like and share this post if you enjoyed it using the buttons at the bottom! Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact Web site with my other posts by category https://morrislee1234.wixsite.com/website LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b #ComputerVision #Captioning #AINewsClips #AI #ML #ArtificialIntelligence #MachineLearning

News to help your R&D in artificial intelligence, machine learning, robotics, computer vision, smart hardware

Survey of video captioning many events in a scene

Recent Posts

Comments