Replace people and objects in street scene images including proper shadows
From image directly output text labels and coordinates of detected objects
Reinforcement learning for learning multi-step tasks on new objects in images
3DETR transformer for 3D Object Detection
Real-time face distance and iris track on mobile phone without depth sensor
Better 3D pose estimates in video by dynamically learning joint relationships
Unsupervised learning of image classes from dynamic video stream
Real-time 3D hand reconstruction from a single monocular image
Get depth, regions, and layout from panoramic image quickly and accurately with horizontal features
Use satellite images to get 3D structure of buildings and roofs
Image classification without normalization that is faster and better than with normalization
Image segmentation of objects and regions using transformers
Using an audio and vision transformer to count crowds
Survey on improving efficiency of computer vision recogntion using deep learning
Efficiently identify people in image in one stage without separate detection step
Answer question about an image using text in scene to find external knowledge
Camera looking at blank wall can determine number of people and activity
Super-resolution video using non-neighboring frames without frame alignment
More accurate action recognition in videos by using the identities of people
Restore image by removing artifacts superimposed in an unknown manner