Answer question about an image using text in scene to find external knowledge

morrislee
Sep 2, 2021
1 min read

External Knowledge Augmented Text Visual Question Answering

arXiv paper abstract https://arxiv.org/abs/2108.09717v1

arXiv PDF paper https://arxiv.org/pdf/2108.09717v1.pdf

The open-ended question answering task of Text-VQA requires reading and reasoning about local, often previously unseen, scene-text content of an image to generate answers.

... propose the generalized use of external knowledge to augment our understanding of the said scene-text.

... extract, filter, and encode knowledge atop a standard multimodal transformer for vision language understanding tasks.

Through empirical evidence, we demonstrate how knowledge can highlight instance-only cues and thus help deal with training data bias, improve answer entity type correctness, and detect multiword named entities.

... results comparable to the state-of-the-art on two publicly available datasets ...

Please like and share this post if you enjoyed it using the buttons at the bottom!

Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact

Web site with my other posts by category https://morrislee1234.wixsite.com/website

#ComputerVision #VisualQuestionAnswering #AINewsClips #AI #ML #ArtificialIntelligence #MachineLearning

News to help your R&D in artificial intelligence, machine learning, robotics, computer vision, smart hardware

Answer question about an image using text in scene to find external knowledge

Recent Posts

Comments