top of page

News to help your R&D in artificial intelligence, machine learning, robotics, computer vision, smart hardware

As an Amazon Associate I earn

from qualifying purchases

Writer's picturemorrislee

Answer question about an image using structured information graph with SA-VQA

Answer question about an image using structured information graph with SA-VQA


SA-VQA: Structured Alignment of Visual and Semantic Representations for Visual Question Answering



Visual Question Answering (VQA) ... is challenging since it requires not only visual and textual understanding, but also the ability to align cross-modality representations.


Previous approaches ... employ entity-level alignments, such as the correlations between the visual regions and their semantic labels, or the interactions across question words and object features.


These attempts aim to improve the cross-modality representations, while ignoring their internal relations.


... propose to apply structured alignments, which work with graph representation of visual and textual content


... solve ... by first converting different modality entities into sequential nodes and the adjacency graph, then incorporating them for structured alignments.


... model, without any pretraining, outperforms the state-of-the-art methods on GQA dataset, and beats the non-pretrained state-of-the-art methods on VQA-v2 dataset.



Please like and share this post if you enjoyed it using the buttons at the bottom!


Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact

Web site with my other posts by category https://morrislee1234.wixsite.com/website


37 views0 comments

Comments


ClickBank paid link

bottom of page