Better document understanding without OCR using Donut transformer

morrislee
Dec 1, 2021
1 min read

Donut: Document Understanding Transformer without OCR

arXiv paper abstract https://arxiv.org/abs/2111.15664

arXiv PDF paper https://arxiv.org/pdf/2111.15664.pdf

Understanding document images (e.g., invoices) has been an important research topic

... current Visual Document Understanding (VDU) systems have come to be designed based on OCR.

... suffer from critical problems induced by the OCR, e.g., (1) expensive computational costs and (2) performance degradation due to the OCR error propagation.

... propose a novel VDU model that is end-to-end trainable without underpinning OCR framework.

... pre-train the model to mitigate the dependencies on large-scale real document images.

... achieves state-of-the-art performance on various document understanding tasks in public benchmark datasets ...

Please like and share this post if you enjoyed it using the buttons at the bottom!

Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact

Web site with my other posts by category https://morrislee1234.wixsite.com/website

#ComputerVision #OCR #AINewsClips #AI #ML #ArtificialIntelligence #MachineLearning

News to help your R&D in artificial intelligence, machine learning, robotics, computer vision, smart hardware

Better document understanding without OCR using Donut transformer

Recent Posts

Comments