Improved object segmentation in video by using object descriptors instead of pixel matching
Answer questions about scene using image and 3D information
Get 3D shape, pose, and relative depth of people from a single image despite occlusion
Improved super-resolution for display screens by using transformer designed for screen content
Better enhancement of dim images with edge-awareness using CSDNet
Robot grips new objects in new poses from 10 examples using neural descriptor fields
Recognize 3D objects when only trained on 2D image and text pairs with PointCLIP
Track people in images better by building 3D model from image and predicting appearance
Calibrate cameras from video with sub-pixel error using self-supervision
Restore faces blurred by air turbulence with prior knowledge from GAN network
Detect new objects better by teaching classifier not to ignore unlabeled objects
Correcting an image classifier prediction by using a single image
Segment objects in a video that are mentioned in a text query
Better document understanding without OCR using Donut transformer
Get centimeter depth image from smartphone using LiDAR and unsteadiness of hand
Classifying visual and audio events of various durations in videos with MM-Pyramid
Multi-label image classification using information on context, space, and meaning
Survey of panoptic image segmentation for objects and regions
Many types of computer vision tasks possible with new customizable vision foundation model, Florence
Correcting Face Distortion in Wide-Angle Videos