Segmentation and depth using generalized cluster prediction with mask transformer with PolyMaX
Segmentation and depth using generalized cluster prediction with mask transformer with PolyMaX
PolyMaX: General Dense Prediction with Mask Transformer
arXiv paper abstract https://arxiv.org/abs/2311.05770
arXiv PDF paper https://arxiv.org/pdf/2311.05770.pdf
Dense prediction tasks, such as semantic segmentation, depth estimation, and surface normal prediction, can be ... formulated as per-pixel classification (discrete outputs) or regression (continuous outputs).
... a shift of paradigm from per-pixel prediction to cluster-prediction with the emergence of transformer architectures, particularly the mask transformers, which directly predicts a label for a mask instead of a pixel.
Despite this shift, methods based on the per-pixel prediction paradigm still dominate the benchmarks on the other dense prediction tasks that require continuous outputs, such as depth estimation and surface normal prediction.
Motivated by the success of DORN and AdaBins in depth estimation, achieved by discretizing the continuous output space, ... propose to generalize the cluster-prediction based method to general dense prediction tasks.
This allows us to unify dense prediction tasks with the mask transformer framework.
... resulting model PolyMaX demonstrates state-of-the-art performance on three benchmarks of NYUD-v2 dataset ...
Please like and share this post if you enjoyed it using the buttons at the bottom!
Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website
Comments