MatchAttention: Matching the Relative Positions for High-Resolution Cross-View Matching

πŸ“„ Paper - 🌐 Project Page - πŸ€— Demo - πŸ’» Code

MatchAttention is a contiguous and differentiable sliding-window attention mechanism that enables long-range connection, explicit matching, and linear complexity. When applied to stereo matching and optical flow, real-time and state-of-the-art performance can be achieved.

Introduction

This paper proposes an attention mechanism, MatchAttention, that dynamically matches relative positions. The relative position determines the attention sampling center of the key-value pairs given a query. Continuous and differentiable sliding-window attention sampling is achieved by the proposed BilinearSoftmax. The relative positions are iteratively updated through residual connections across layers by embedding them into the feature channels. Since the relative position is exactly the learning target for cross-view matching, an efficient hierarchical cross-view decoder, MatchDecoder, is designed with MatchAttention as its core component. To handle cross-view occlusions, gated cross-MatchAttention and a consistency-constrained loss are proposed. These two components collectively mitigate the impact of occlusions in both forward and backward passes, allowing the model to focus more on learning matching relationships. When applied to stereo matching, MatchStereo-B ranked 1st in average error on the public Middlebury benchmark and requires only 29ms for KITTI-resolution inference. MatchStereo-T can process 4K UHD images in 0.1 seconds using only 3GB of GPU memory. The proposed models also achieve state-of-the-art performance on KITTI 2012, KITTI 2015, ETH3D, and Spring flow datasets. The combination of high accuracy and low computational complexity makes real-time, high-resolution, and high-accuracy cross-view matching possible.

Key Features

  • Efficient and Scalable: Achieves linear complexity, enabling high-resolution image processing with low GPU memory. MatchStereo-T can process 4K UHD images in 0.1 seconds using only 3GB of GPU memory.
  • State-of-the-Art Performance: Ranked 1st in average error on the Middlebury benchmark for stereo matching and achieves state-of-the-art results on KITTI 2012, KITTI 2015, ETH3D, and Spring flow datasets.
  • Explainable Occlusion Handling: Introduces gated cross-MatchAttention and a consistency-constrained loss to effectively mitigate the impact of occlusions, allowing the model to focus on learning robust matching relationships.

Model Weights

The following pre-trained model weights are available:

Model Params Resolution FLOPs GPU Mem Latency Checkpoint
MatchStereo-T 8.78M 1536x1536 0.34T 1.45G 38ms Hugging Face
MatchStereo-S 25.2M 1536x1536 0.98T 1.73G 45ms Hugging Face
MatchStereo-B 75.5M 1536x1536 3.59T 2.94G 75ms Hugging Face
MatchFlow-B 75.5M 1536x1536 3.60T 3.22G 77ms Hugging Face

Sample Usage

To get started with inference using the provided scripts, follow the instructions on the GitHub repository. Here's an example for running stereo matching on custom images from the command line:

# Clone the repository and install dependencies first as per the GitHub README.
# Then navigate to the MatchAttention directory.

# Run stereo matching on custom images
python run_img.py --img0_dir images/left/ --img1_dir images/right/ --output_path outputs --checkpoint_path checkpoints/matchstereo_tiny_fsd.pth --no_compile

For other tasks like optical flow, testing on benchmarks, or running the local Gradio demo, refer to the GitHub repository.

Citation

If you find our work helpful, please cite our paper:

@article{yan2025matchattention,
  title={MatchAttention: Matching the Relative Positions for High-Resolution Cross-View Matching},
  author={Tingman Yan and Tao Liu and Xilian Yang and Qunfei Zhao and Zeyang Xia},
  journal={arXiv preprint arXiv:2510.14260},
  year={2025}
}

Acknowledgement

We would like to thank the authors of UniMatch, RAFT-Stereo, MetaFormer, and TransNeXt for their code release. Thanks to the author of FoundationStereo for the release of the FSD dataset.

Contact

Please reach out to Tingman Yan for questions.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using Tingman/MatchAttention 1