Train new object detector without bounding box annotations using captioned images
Reidentify people in new scenes better by using multiple networks
Faster multi-person pose estimation by using object modeling
Better training data for driving by using simulator to guide realistic image synthesis
Survey of fine-grained image analysis using deep learning
Survey of computer vision using transformers
Match 3D points with 99% outliers quickly with VOCRA
Improved super-resolution for images by using flows
Detect image anomalies by using low-dimensional embeddings of patches
Survey of video anomaly detection using self-supervised deep learning
Get 3D scene geometry and segmentation from a single RGB image
Video segmentation with less carrying over of errors by using a cyclic workflow
Scene segmentation 7.3 times faster with 3D transformer using patch attention
Improve vision transformer by using anti-aliasing
Train object detectors using images synthesized from real unmarked images
Survey of training object detectors with limited data or unlabeled data
Recognizing actions without training using scene context with object recognition
Get foreground in image without user marking borders using 100x smaller model
Better image captioning and question answering using weakly supervised training
Find person in image gallery using text queries by leveraging larger libraries