MobileViT: an accurate, light-weight, mobile-friendly vision transformer
Detect 133 points on face, hands, and feet in real-time by using neighborhood
Fast online action detection by storing past data relevant now
Better interactive image segmentation for training by using edge information
Flexible artifact removal in JPEG images
Non-uniform image sampling for better image segmentation
Better deblurring of motion images by using edges
Survey of advances in continual learning in computer vision
Replace people and objects in street scene images including proper shadows
From image directly output text labels and coordinates of detected objects
Reinforcement learning for learning multi-step tasks on new objects in images
3DETR transformer for 3D Object Detection
Real-time face distance and iris track on mobile phone without depth sensor
Better 3D pose estimates in video by dynamically learning joint relationships
Unsupervised learning of image classes from dynamic video stream
Real-time 3D hand reconstruction from a single monocular image
Get depth, regions, and layout from panoramic image quickly and accurately with horizontal features
Image classification without normalization that is faster and better than with normalization
Image segmentation of objects and regions using transformers
Using an audio and vision transformer to count crowds