Better depth from monocular image by reasoning globally and locally with MonoViT
Better depth from monocular image by reasoning globally and locally with MonoViT
MonoViT: Self-Supervised Monocular Depth Estimation with a Vision Transformer
arXiv paper abstract https://arxiv.org/abs/2208.03543v1
arXiv PDF paper https://arxiv.org/pdf/2208.03543v1.pdf
Self-supervised monocular depth estimation is an attractive solution that does not require hard-to-source depth labels for training.
... However, their limited receptive field constrains existing network architectures to reason only locally, dampening the effectiveness of the self-supervised paradigm.
... propose MonoViT, a brand-new framework combining the global reasoning enabled by ViT models with the flexibility of self-supervised monocular depth estimation.
By combining plain convolutions with Transformer blocks, ... model can reason locally and globally, yielding depth prediction at a higher level of detail and accuracy, allowing MonoViT to achieve state-of-the-art performance on the established KITTI dataset ...
Please like and share this post if you enjoyed it using the buttons at the bottom!
Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website
Comments