--- license: apache-2.0 ---

AnyCalib:
On-Manifold Learning for Model-Agnostic Single-View Camera Calibration

Javier Tirado-Garín    Javier Civera
I3A, University of Zaragoza

Camera calibration from a single perspective/edited/distorted image using a freely chosen camera model

[![Github](https://img.shields.io/badge/GitHub-100000?style=flat&logo=github&logoColor=white)](https://github.com/javrtg/AnyCalib) [![arXiv](https://img.shields.io/badge/arXiv-2503.12701-b31b1b?logo=arxiv&style=flat-square)](https://arxiv.org/abs/2503.12701)
## Usage (pretrained models) The only requirements are Python (≥3.10) and PyTorch. The project, in development mode, can be installed with: ```shell git clone https://github.com/javrtg/AnyCalib.git && cd AnyCalib pip install -e . ``` Alternatively, and optionally, a compatible version of [`xformers`](https://github.com/facebookresearch/xformers) can also be installed for better efficiency by running the following instead of `pip install -e .`: ```shell pip install -e .[eff] ``` ### Minimal usage example ```python import numpy as np import torch from PIL import Image # the library of choice to load images from anycalib import AnyCalib dev = torch.device("cuda") # load input image and convert it to a (3, H, W) tensor with RGB values in [0, 1] image = np.array(Image.open("path/to/image.jpg").convert("RGB")) image = torch.tensor(image, dtype=torch.float32, device=dev).permute(2, 0, 1) / 255 # instantiate AnyCalib according to the desired model_id. Options: # "anycalib_pinhole": model trained with *only* perspective (pinhole) images, # "anycalib_gen": trained with perspective, distorted and strongly distorted images, # "anycalib_dist": trained with distorted and strongly distorted images, # "anycalib_edit": Trained on edited (stretched and cropped) perspective images. model = AnyCalib(model_id="anycalib_pinhole").to(dev) # Alternatively, the weights can be loaded from the huggingface hub as follows: # NOTE: huggingface_hub (https://pypi.org/project/huggingface-hub/) needs to be installed # model = AnyCalib().from_pretrained(model_id=).to(dev) # predict according to the desired camera model. Implemented camera models are detailed further below. output = model.predict(image, cam_id="pinhole") # output is a dictionary with the following key-value pairs: # { # "intrinsics": (D,) tensor with the estimated intrinsics for the selected camera model, # "fov_field": (N, 2) tensor with the regressed FoV field by the network. N≈320^2 (resolution close to the one seen during training), # "tangent_coords": alias for "fov_field", # "rays": (N, 3) tensor with the corresponding (via the exponential map) ray directions in the camera frame (x right, y down, z forward), # "pred_size": (H, W) tuple with the image size used by the network. It can be used e.g. for resizing the FoV/ray fields to the original image size. # } ``` The weights of the selected `model_id`, if not already downloaded, will be automatically downloaded to the: * torch hub cache directory (`torch.hub.get_dir()`) if `AnyCalib(model_id=)` is used, or * huggingface cache directory if `AnyCalib().from_pretrained(model_id=)` is used. Additional configuration options are indicated in the docstring of `AnyCalib`:
help(AnyCalib) ```python """AnyCalib class. Args for instantiation: model_id: one of {'anycalib_pinhole', 'anycalib_gen', 'anycalib_dist', 'anycalib_edit'}. Each model differes in the type of images they seen during training: * 'anycalib_pinhole': Perspective (pinhole) images, * 'anycalib_gen': General images, including perspective, distorted and strongly distorted images, and * 'anycalib_dist': Distorted images using the Brown-Conrady camera model and strongly distorted images, using the EUCM camera model, * 'anycalib_edit': Trained on edited (stretched and cropped) perspective images. Default: 'anycalib_pinhole'. nonlin_opt_method: nonlinear optimization method: 'gauss_newton' or 'lev_mar'. Default: 'gauss_newton' nonlin_opt_conf: nonlinear optimization configuration. This config can be used to control the number of iterations and the space where the residuals are minimized. See the classes `GaussNewtonCalib` or `LevMarCalib` under anycalib/optim for details. Default: None. init_with_sac: use RANSAC instead of nonminimal fit for initializating the intrinsics. Default: False. fallback_to_sac: use RANSAC if nonminimal fit fails. Default: True. ransac_conf: RANSAC configuration. This config can be used to control e.g. the inlier threshold or the number of minimal samples to try. See the class `RANSAC` in anycalib/ransac.py for details. Default: None. rm_borders: border size of the dense FoV fields to ignore during fitting. Default: 0. sample_size: approximate number of 2D-3D correspondences to use for fitting the intrinsics. Negative value -> no subsampling. Default: -1. """ ```
### Minimal batched example AnyCalib can also be executed in batch and using possibly different camera models for each image. For example: ```python images = ... # (B, 3, H, W) # NOTE: if cam_ids is a list, then len(cam_ids) must be equal to B cam_ids = ["pinhole", "radial:1", "kb:4"] # different camera models for each image cam_ids = "pinhole" # same camera model across images output = model.predict(images, cam_id=cam_ids) # corresponding batched output dictionary: # { # "intrinsics": List[(D_i,) tensors] for each camera model "i", # "fov_field": (B, N, 2) tensor, # "tangent_coords": alias for "fov_field", # "rays": (B, N, 3) tensor, # "pred_size": (H, W). # } ``` ### Currently implemented camera models * `cam_id` represents the camera model identifier(s) that can be used in the `predict` method.
* `D` corresponds to the number of intrinsics of the camera model. It determines the length of each `intrinsics` tensor in the output dictionary. | `cam_id` | Description | `D` | Intrinsics | |:--|:--|:-:|:--| | `pinhole` | Pinhole camera model | 4 | $f_x,~f_y,~c_x,~c_y$ | | `simple_pinhole` | `pinhole` with one focal length | 3 | $f,~c_x,~c_y$ | | `radial:k` | Radial (Brown-Conrady) [[1]](#1) camera model with `k` $\in$ [1, 4] distortion coefficients | 4+`k` | $f_x,~f_y,~c_x,~c_y$
$k_1[,~k_2[,~k_3[,~k_4]]]$ | | `simple_radial:k` | `radial:k` with one focal length | 3+`k` | $f,~c_x,~c_y$
$k_1[,~k_2[,~k_3[,~k_4]]]$ | | `kb:k` | Kannala-Brandt [[2]](#2) camera model with `k` $\in$ [1, 4] distortion coefficients | 4+`k` | $f_x,~f_y,~c_x,~c_y$
$k_1[,~k_2[,~k_3[,~k_4]]]$ | | `simple_kb:k` | `kb:k` with one focal length | 3+`k` | $f,~c_x,~c_y$
$k_1[,~k_2[,~k_3[,~k_4]]]$ | | `ucm` | Unified Camera Model [[3]](#3) | 5 | $f_x,~f_y,~c_x,~c_y$
$k$ | | `simple_ucm` | `ucm` with one focal length | 4 | $f,~c_x,~c_y$
$k$ | | `eucm` | Enhanced Unified Camera Model [[4]](#4) | 6 | $f_x,~f_y,~c_x,~c_y$
$k_1,~k_2$ | | `simple_eucm` | `eucm` with one focal length | 5 | $f,~c_x,~c_y$
$k_1,~k_2$ | | `division:k` | Division camera model [[5]](#5) with `k` $\in$ [1, 4] distortion coefficients | 4+`k` | $f_x,~f_y,~c_x,~c_y$
$k_1[,~k_2[,~k_3[,~k_4]]]$ | | `simple_division:k` | `division:k` with one focal length | 3+`k` | $f,~c_x,~c_y$
$k_1[,~k_2[,~k_3[,~k_4]]]$ | In addition to the original works, we recommend the works of Usenko et al. [[6]](#6) and Lochman et al. [[7]](#7) for a comprehensive comparison of the different camera models. ## Evaluation The evaluation and training code is built upon the [`siclib`](siclib) library from [GeoCalib](https://github.com/cvg/GeoCalib), which can be installed as: ```shell pip install -e siclib ``` Running the evaluation commands will write the results to `outputs/results/`. ### LaMAR Running the evaluation commands will download the dataset to `data/lamar2k` which will take around 400 MB of disk space. AnyCalib trained on $\mathrm{OP_{p}}$: ```shell python -m siclib.eval.lamar2k_rays --conf anycalib_pretrained --tag anycalib_p --overwrite ``` AnyCalib trained on $\mathrm{OP_{g}}$: ```shell python -m siclib.eval.lamar2k_rays --conf anycalib_pretrained --tag anycalib_g --overwrite model.model_id=anycalib_gen ``` ### MegaDepth (pinhole) Running the evaluation commands will download the dataset to `data/megadepth2k` which will take around 2 GB of disk space. AnyCalib trained on $\mathrm{OP_{p}}$: ```shell python -m siclib.eval.megadepth2k_rays --conf anycalib_pretrained --tag anycalib_p --overwrite ``` AnyCalib trained on $\mathrm{OP_{g}}$: ```shell python -m siclib.eval.megadepth2k_rays --conf anycalib_pretrained --tag anycalib_g --overwrite model.model_id=anycalib_gen ``` ### TartanAir Running the evaluation commands will download the dataset to `data/tartanair` which will take around 1.7 GB of disk space. AnyCalib trained on $\mathrm{OP_{p}}$: ```shell python -m siclib.eval.tartanair_rays --conf anycalib_pretrained --tag anycalib_p --overwrite ``` AnyCalib trained on $\mathrm{OP_{g}}$: ```shell python -m siclib.eval.tartanair_rays --conf anycalib_pretrained --tag anycalib_g --overwrite model.model_id=anycalib_gen ``` ### Stanford2D3D Running the evaluation commands will download the dataset to `data/stanford2d3d` which will take around 844 MB of disk space. AnyCalib trained on $\mathrm{OP_{p}}$: ```shell python -m siclib.eval.stanford2d3d_rays --conf anycalib_pretrained --tag anycalib_p --overwrite ``` AnyCalib trained on $\mathrm{OP_{g}}$: ```shell python -m siclib.eval.stanford2d3d_rays --conf anycalib_pretrained --tag anycalib_g --overwrite model.model_id=anycalib_gen ``` ### MegaDepth (radial) Running the evaluation commands will download the dataset to `data/megadepth2k-radial` which will take around 1.4 GB of disk space. AnyCalib trained on $\mathrm{OP_{g}}$: ```shell python -m siclib.eval.megadepth2k_radial_rays --conf anycalib_pretrained --tag anycalib_g --overwrite model.model_id=anycalib_gen ``` ### Mono Running the evaluation commands will download the dataset to `data/monovo2k` which will take around 445 MB of disk space. AnyCalib trained on $\mathrm{OP_{d}}$: ```shell python -m siclib.eval.monovo2k_rays --conf anycalib_pretrained --tag anycalib_d --overwrite model.model_id=anycalib_dist data.cam_id=ucm ``` AnyCalib trained on $\mathrm{OP_{g}}$: ```shell python -m siclib.eval.monovo2k_rays --conf anycalib_pretrained --tag anycalib_g --overwrite model.model_id=anycalib_gen data.cam_id=ucm ``` ### ScanNet++ To comply with ScanNet++ license, we cannot directly share its data. Please download the ScanNet++ dataset following the [official instructions](https://kaldir.vc.in.tum.de/scannetpp/#:~:text=the%20data%20now.-,Download%20the%20data,-To%20download%20the) and indicate the path to the root of the dataset in the following evaluation command.
This needs to be provided only the first time the evaluation is run. This first time, the command will automatically copy the evaluation images under `data/scannetpp2k` which will take around 760 MB of disk space. AnyCalib trained on $\mathrm{OP_{d}}$: ```shell python -m siclib.eval.scannetpp2k_rays --conf anycalib_pretrained --tag anycalib_d --overwrite model.model_id=anycalib_dist scannetpp_root= ``` AnyCalib trained on $\mathrm{OP_{g}}$: ```shell python -m siclib.eval.scannetpp2k_rays --conf anycalib_pretrained --tag anycalib_g --overwrite model.model_id=anycalib_gen scannetpp_root= ``` ### LaMAR (edited) Running the evaluation commands will download the dataset to `data/lamar2k_edit` which will take around 224 MB of disk space. AnyCalib trained following WildCam [[8]](#8) training protocol: ```shell python -m siclib.eval.lamar2k_rays --conf anycalib_pretrained --tag anycalib_e --overwrite model.model_id=anycalib_edit eval.eval_on_edit=True ``` ### Tartanair (edited) Running the evaluation commands will download the dataset to `data/tartanair_edit` which will take around 488 MB of disk space. AnyCalib trained following WildCam [[8]](#8) training protocol: ```shell python -m siclib.eval.tartanair_rays --conf anycalib_pretrained --tag anycalib_e --overwrite model.model_id=anycalib_edit eval.eval_on_edit=True ``` ### Stanford2D3D (edited) Running the evaluation commands will download the dataset to `data/stanford2d3d_edit` which will take around 420 MB of disk space. AnyCalib trained on $\mathrm{OP_{p}}$, following WildCam [[8]](#8) training protocol: ```shell python -m siclib.eval.stanford2d3d_rays --conf anycalib_pretrained --tag anycalib_e --overwrite model.model_id=anycalib_edit eval.eval_on_edit=True ``` ## Extended OpenPano Dataset We extend the OpenPano dataset from [GeoCalib](https://github.com/cvg/GeoCalib?tab=readme-ov-file#openpano-dataset) with panoramas that not need to be aligned with the gravity direction. This extended version consists of tonemapped panoramas from [The Laval Photometric Indoor HDR Dataset](http://hdrdb.com/indoor-hdr-photometric/), [PolyHaven](https://polyhaven.com/hdris), [HDRMaps](https://hdrmaps.com/freebies/free-hdris/), [AmbientCG](https://ambientcg.com/list?type=hdri&sort=popular) and [BlenderKit](https://www.blenderkit.com/asset-gallery?query=category_subtree:hdr). Before sampling images from the panoramas, first download the Laval dataset following the instructions on the [corresponding project page](http://hdrdb.com/indoor-hdr-photometric/#:~:text=HDR%20Dataset.-,Download,-To%20obtain%20the) and place the panoramas in `data/indoorDatasetCalibrated`. Then, tonemap the HDR images using the following command: ```shell python -m siclib.datasets.utils.tonemapping --hdr_dir data/indoorDatasetCalibrated --out_dir data/laval-tonemap ``` To download the rest of the panoramas and organize all the panoramas in their corresponding splits `data/openpano_v2/panoramas/{split}`, execute: ```shell python -m siclib.datasets.utils.download_openpano --name openpano_v2 --laval_dir data/laval-tonemap ``` The panoramas from PolyHaven, HDRMaps, AmbientCG and BlenderKit can be alternatively manually downloaded from [here](https://drive.google.com/drive/folders/1HSXKNrleJKas4cRLd1C8SqR9J1nU1-Z_?usp=sharing). Afterwards, the different training datasets mentioned in the paper: $\mathrm{OP_{p}}$, $\mathrm{OP_{g}}$, $\mathrm{OP_{r}}$ and $\mathrm{OP_{d}}$ can be created by running the following commands. We recommend running them with the flag `device=cuda` as this significantly speeds up the creation of the datasets, but if no GPU is available, the flag can be omitted. $\mathrm{OP_{p}}$ (will be stored under `data/openpano_v2/openpano_v2`): ```shell python -m siclib.datasets.create_dataset_from_pano --config-name openpano_v2 device=cuda ``` $\mathrm{OP_{g}}$ (will be stored under `data/openpano_v2/openpano_v2_gen`): ```shell python -m siclib.datasets.create_dataset_from_pano_rays --config-name openpano_v2_gen device=cuda ``` $\mathrm{OP_{r}}$ (will be stored under `data/openpano_v2/openpano_v2_radial`): ```shell python -m siclib.datasets.create_dataset_from_pano_rays --config-name openpano_v2_radial device=cuda ``` $\mathrm{OP_{d}}$ (will be stored under `data/openpano_v2/openpano_v2_dist`): ```shell python -m siclib.datasets.create_dataset_from_pano_rays --config-name openpano_v2_dist device=cuda ``` ## Training As with the evaluation, the training code is built upon the [`siclib`](siclib) library from [GeoCalib](https://github.com/cvg/GeoCalib). Here we adapt their instructions to AnyCalib. `siclib` can be installed executing: ```shell pip install -e siclib ``` Once (at least one of) the [extended OpenPano Dataset](#Extended-OpenPano-Dataset) (`openpano_v2`) has been downloaded and prepared, we can train AnyCalib with it. For training with $\mathrm{OP_{p}}$ (default): ```shell python -m siclib.train anycalib_op_p --conf anycalib --distributed ``` Feel free to use any other experiment name. By default, the checkpoints will be written to `outputs/training/`. The default batch size is 24 which requires at least 1 NVIDIA Tesla V100 GPU with 32GB of VRAM. If only one GPU is used, the flag `--distributed` can be omitted. Configurations are managed by [Hydra](https://hydra.cc/) and can be overwritten from the command line. For example, for training with $\mathrm{OP_{g}}$: ```shell python -m siclib.train anycalib_op_g --conf anycalib --distributed data.dataset_dir='data/openpano_v2/openpano_v2_gen' ``` For training with $\mathrm{OP_{d}}$: ```shell python -m siclib.train anycalib_op_d --conf anycalib --distributed data.dataset_dir='data/openpano_v2/openpano_v2_dist' ``` For training with $\mathrm{OP_{r}}$: ```shell python -m siclib.train anycalib_op_r --conf anycalib --distributed data.dataset_dir='data/openpano_v2/openpano_v2_radial' ``` For training with $\mathrm{OP_{p}}$ on edited (stretched and cropped) images, following the training protocol of WildCam [[8]](#8): ```shell python -m siclib.train anycalib_op_e --conf anycalib --distributed \ data.dataset_dir='data/openpano_v2/openpano_v2' \ data.im_geom_transform.change_pixel_ar=true \ data.im_geom_transform.crop=0.5 ``` After training, the model can be evaluated using its experiment name: ```shell python -m siclib.eval. --checkpoint --tag --conf anycalib ``` ## Acknowledgements Thanks to the authors of [GeoCalib](https://github.com/cvg/GeoCalib) for open-sourcing the comprehensive and easy-to-use [`siclib`](https://github.com/cvg/GeoCalib/tree/main/siclib) which we use as the base of our evaluation and training code.
Thanks to the authors of the [The Laval Photometric Indoor HDR Dataset](http://hdrdb.com/indoor-hdr-photometric/) for allowing us to release the weights of AnyCalib under a permissive license.
Thanks also to the authors of [The Laval Photometric Indoor HDR Dataset](http://hdrdb.com/indoor-hdr-photometric/), [PolyHaven](https://polyhaven.com/hdris), [HDRMaps](https://hdrmaps.com/freebies/free-hdris/), [AmbientCG](https://ambientcg.com/list?type=hdri&sort=popular) and [BlenderKit](https://www.blenderkit.com/asset-gallery?query=category_subtree:hdr) for providing high-quality freely-available panoramas that made the training of AnyCalib possible. ## BibTex citation If you use any ideas from the paper or code from this repo, please consider citing: ```bibtex @InProceedings{tirado2025anycalib, author={Javier Tirado-Gar{\'\i}n and Javier Civera}, title={{AnyCalib: On-Manifold Learning for Model-Agnostic Single-View Camera Calibration}}, booktitle={ICCV}, year={2025} } ``` ## License Code and weights are provided under the [Apache 2.0 license](LICENSE). ## References [1] Close-Range Camera Calibration. D.C. Brown, 1971. [2] A Generic Camera Model and Calibration Method for Conventional, Wide-Angle, and Fish-Eye Lenses. J. Kannala, S.S. Brandt, TPAMI 2006. [3] Single View Point Omnidirectional Camera Calibration from Planar Grids. C. Mei, P. Rives, ICRA, 2007. [4] An Enhanced Unified Camera Model. B. Khomutenko, at al., IEEE RA-L, 2016. [5] Simultaneous Linear Estimation of Multiple View Geometry and Lens Distortion. A.W. Fitzgibbon, CVPR, 2001. [6] The Double Sphere Camera Model. V. Usenko, et al., 3DV, 2018. [7] BabelCalib: A Universal Approach to Calibrating Central Cameras. Y. Lochman, et al., ICCV, 2021. [8] Tame a Wild Camera: In-the-Wild Monocular Camera Calibration. S. Zhu, et al., NeurIPS, 2023.