MetricAnything: Scaling Metric Depth Pretraining with Noisy Heterogeneous Sources
Abstract
Metric Anything presents a scalable pretraining framework for metric depth estimation that leverages diverse 3D data and sparse metric prompts to achieve superior performance across multiple vision tasks.
Scaling has powered recent advances in vision foundation models, yet extending this paradigm to metric depth estimation remains challenging due to heterogeneous sensor noise, camera-dependent biases, and metric ambiguity in noisy cross-source 3D data. We introduce Metric Anything, a simple and scalable pretraining framework that learns metric depth from noisy, diverse 3D sources without manually engineered prompts, camera-specific modeling, or task-specific architectures. Central to our approach is the Sparse Metric Prompt, created by randomly masking depth maps, which serves as a universal interface that decouples spatial reasoning from sensor and camera biases. Using about 20M image-depth pairs spanning reconstructed, captured, and rendered 3D data across 10000 camera models, we demonstrate-for the first time-a clear scaling trend in the metric depth track. The pretrained model excels at prompt-driven tasks such as depth completion, super-resolution and Radar-camera fusion, while its distilled prompt-free student achieves state-of-the-art results on monocular depth estimation, camera intrinsics recovery, single/multi-view metric 3D reconstruction, and VLA planning. We also show that using pretrained ViT of Metric Anything as a visual encoder significantly boosts Multimodal Large Language Model capabilities in spatial intelligence. These results show that metric depth estimation can benefit from the same scaling laws that drive modern foundation models, establishing a new path toward scalable and efficient real-world metric perception. We open-source MetricAnything at http://metric-anything.github.io/metric-anything-io/ to support community research.
Community
Project Page: https://metric-anything.github.io/metric-anything-io/
Code: https:https://github.com/metric-anything/metric-anything
arXivLens breakdown of this paper ๐ https://arxivlens.com/PaperView/Details/metricanything-scaling-metric-depth-pretraining-with-noisy-heterogeneous-sources-9796-19136b9a
- Executive Summary
- Detailed Breakdown
- Practical Applications
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Masked Depth Modeling for Spatial Perception (2026)
- DeFM: Learning Foundation Representations from Depth for Robotics (2026)
- InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields (2026)
- Depth Any Panoramas: A Foundation Model for Panoramic Depth Estimation (2025)
- Re-Depth Anything: Test-Time Depth Refinement via Self-Supervised Re-lighting (2025)
- Robust Single-shot Structured Light 3D Imaging via Neural Feature Decoding (2025)
- UDPNet: Unleashing Depth-based Priors for Robust Image Dehazing (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper