MambaMia-Vicuna-7B-Mini

This is the MambaMia Mini model based on lmsys/vicuna-7b-v1.5, designed for efficient long-form video understanding.

Model Description

MambaMia is a State-Space-Model-based hierarchical compression method for efficient video understanding in Large Multimodal Models (LMMs). It addresses the computational cost and information redundancy challenges in processing long videos.

Key Features:

  • Hierarchical compression using State-Space Models (SSM)
  • Gated attention mechanism
  • Learnable sampling strategy
  • Significantly reduced memory usage and faster inference compared to existing LMMs

Usage

Please refer to the MambaMia repository for detailed usage instructions.

Citation

If you use this model, please cite:

@misc{kim2025mambamia,
  title={MambaMia: A State-Space-Model-Based Compression for Efficient Video Understanding in Large Multimodal Models},
  author={Geewook Kim and Minjoon Seo},
  year={2025},
  eprint={2506.13564},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2506.13564}
}

License

This model is released under the Llama 2 Community License Agreement, following the base model's license.

Acknowledgements

Downloads last month
35
Safetensors
Model size
7B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for gwkrsrch/MambaMia-Vicuna-7B-Mini

Finetuned
(66)
this model

Paper for gwkrsrch/MambaMia-Vicuna-7B-Mini