Answer question about an image using structured information graph with SA-VQA

morrislee
Jan 28, 2022
1 min read

SA-VQA: Structured Alignment of Visual and Semantic Representations for Visual Question Answering

arXiv paper abstract https://arxiv.org/abs/2201.10654v1

arXiv PDF paper https://arxiv.org/pdf/2201.10654v1.pdf

Visual Question Answering (VQA) ... is challenging since it requires not only visual and textual understanding, but also the ability to align cross-modality representations.

Previous approaches ... employ entity-level alignments, such as the correlations between the visual regions and their semantic labels, or the interactions across question words and object features.

These attempts aim to improve the cross-modality representations, while ignoring their internal relations.

... propose to apply structured alignments, which work with graph representation of visual and textual content

... solve ... by first converting different modality entities into sequential nodes and the adjacency graph, then incorporating them for structured alignments.

... model, without any pretraining, outperforms the state-of-the-art methods on GQA dataset, and beats the non-pretrained state-of-the-art methods on VQA-v2 dataset.

Please like and share this post if you enjoyed it using the buttons at the bottom!

Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact

Web site with my other posts by category https://morrislee1234.wixsite.com/website

#ComputerVision #VisualQuestionAnswering #AINewsClips #AI #ML #ArtificialIntelligence #MachineLearning

News to help your R&D in artificial intelligence, machine learning, robotics, computer vision, smart hardware

Answer question about an image using structured information graph with SA-VQA

Recent Posts

Comments