Boost neural network performance with circular kernels instead of square ones
Transformer handles images, video, point clouds, and audio to understand world
Object detect, segment, and pose using one tracking framework
Get absolute distance of object in image regardless of type
Action recognition by turning videos into an image mosaic
Efficiently detect 3D objects in 2D range image with graph convolution kernels
Getting 3D shapes from a single image
Quickly identify object in many camera views using tree of neural nets
Answering questions about an image using outside knowledge
Get hand interactions in 3D in real-time from monocular video
Learning by comparing images improved by understanding objects
Finding prominent objects in images without manual labeled training
Training with modified data like using 10 times more data
Get 3D pose and shape of people from monocular images
Survey of continual learning for image classification
Counting new objects in image using only a few examples
Transformer for super-resolution video
Faster segmentation of objects in video with new affinity formula
Real-time 3D face mesh from one camera on phones
Improved segmentation of objects and regions in images without object proposals