MambaMia: A State-Space-Model-Based Compression for Efficient Video Understanding in Large Multimodal Models
Paper
•
2506.13564
•
Published
This is the MambaMia Mini model based on lmsys/vicuna-7b-v1.5, designed for efficient long-form video understanding.
MambaMia is a State-Space-Model-based hierarchical compression method for efficient video understanding in Large Multimodal Models (LMMs). It addresses the computational cost and information redundancy challenges in processing long videos.
Key Features:
Please refer to the MambaMia repository for detailed usage instructions.
If you use this model, please cite:
@misc{kim2025mambamia,
title={MambaMia: A State-Space-Model-Based Compression for Efficient Video Understanding in Large Multimodal Models},
author={Geewook Kim and Minjoon Seo},
year={2025},
eprint={2506.13564},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2506.13564}
}
This model is released under the Llama 2 Community License Agreement, following the base model's license.
Base model
lmsys/vicuna-7b-v1.5