Get 3D object shape by attention on 2D latent features to learn 3D consistency with MVDiffusion++
Get 3D object shape by attention on 2D latent features to learn 3D consistency with MVDiffusion++
MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction
arXiv paper abstract https://arxiv.org/abs/2402.12712
arXiv PDF paper https://arxiv.org/pdf/2402.12712.pdf
Project page https://mvdiffusion-plusplus.github.io
... presents a neural architecture MVDiffusion++ for 3D object reconstruction that synthesizes dense and high-resolution views of an object given one or a few images without camera poses.
MVDiffusion++ achieves superior flexibility and scalability with two surprisingly simple ideas:
1) A "pose-free architecture" where standard self-attention among 2D latent features learns 3D consistency across an arbitrary number of conditional and generation views without explicitly using camera pose information; and
2) A "view dropout strategy" that discards a substantial number of output views during training, which reduces the training-time memory footprint and enables dense and high-resolution view synthesis at test time.
... use the Objaverse for training and the Google Scanned Objects for evaluation with standard novel view synthesis and 3D reconstruction metrics, where MVDiffusion++ ... outperforms the ... state of the arts.
... also demonstrate a text-to-3D application example by combining MVDiffusion++ with a text-to-image generative model.
Please like and share this post if you enjoyed it using the buttons at the bottom!
Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website
Comments