GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding Paper โข 2503.10596 โข Published Mar 13 โข 18
Knowledge Mining with Scene Text for Fine-Grained Recognition Paper โข 2203.14215 โข Published Mar 27, 2022
GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding Paper โข 2412.13193 โข Published Dec 17, 2024 โข 1
Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation Paper โข 2502.13145 โข Published Feb 18 โข 37
ControlAR: Controllable Image Generation with Autoregressive Models Paper โข 2410.02705 โข Published Oct 3, 2024 โข 11
Deep High-Resolution Representation Learning for Visual Recognition Paper โข 1908.07919 โข Published Aug 20, 2019 โข 2
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model Paper โข 2406.20076 โข Published Jun 28, 2024 โข 10
YOLO-World: Real-Time Open-Vocabulary Object Detection Paper โข 2401.17270 โข Published Jan 30, 2024 โข 42