Answer question about an image using text in scene to find external knowledge
Answer question about an image using text in scene to find external knowledge
External Knowledge Augmented Text Visual Question Answering
arXiv paper abstract https://arxiv.org/abs/2108.09717v1
arXiv PDF paper https://arxiv.org/pdf/2108.09717v1.pdf
The open-ended question answering task of Text-VQA requires reading and reasoning about local, often previously unseen, scene-text content of an image to generate answers.
... propose the generalized use of external knowledge to augment our understanding of the said scene-text.
... extract, filter, and encode knowledge atop a standard multimodal transformer for vision language understanding tasks.
Through empirical evidence, we demonstrate how knowledge can highlight instance-only cues and thus help deal with training data bias, improve answer entity type correctness, and detect multiword named entities.
... results comparable to the state-of-the-art on two publicly available datasets ...
Please like and share this post if you enjoyed it using the buttons at the bottom!
Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website
Comments