Detect unknown and known objects using CLIP, SAM, and GDINO with cooperative-foundational-models
Count objects in image with point prompts from object localization and CLIP to identify with PseCo
Segment scene with RGB-D images by efficiently fusing RGB and depth features with MIPANet
Detect objects in new domain by distilling more balanced source features with DUA-DA
Get object pose using pre-trained synthetic data and no ground truth labels with RKHSPose
Super-resolution video using recurrent back-projection GAN with RBPGAN
Get 3D object shape from monocular RGB-D by learn surface and map to frame with DynamicSurf
Get 3D object and scene using new regularization term and quadratic layers with StEik
Segmentation and depth using generalized cluster prediction with mask transformer with PolyMaX
Get human pose using attention mechanism to expand receptive fields with SADI-NET
Detect 3D object from one image using features from 3D-aware diffusion with 3DiffTection
Explain a fine-grained image classification result by searching image for class with INTR
Survey of video captioning many events in a scene
Segment 3D point clouds using foundation models for 2D vision by with Dong
Get 3D scene from a few monocular images using CLIP with varying depth bins with Hu
Get 3D object shape with fine shape details using monocular images with SSR
Survey of computer vision backbones on many tasks
Segment objects in scene after training only on object types with SISeg
Segment scene unsupervised using close features have similar semantics with SmooSeg
Segment scene in new domain unsupervised by refine pseudo labels and predict noisy labels with PRN