Image-Text-to-Text
Transformers
Safetensors
multilingual
minicpmv
feature-extraction
minicpm-v
vision
ocr
multi-image
video
custom_code
conversational
Instructions to use openbmb/MiniCPM-V-4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use openbmb/MiniCPM-V-4 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="openbmb/MiniCPM-V-4", trust_remote_code=True) messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("openbmb/MiniCPM-V-4", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use openbmb/MiniCPM-V-4 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "openbmb/MiniCPM-V-4" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openbmb/MiniCPM-V-4", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/openbmb/MiniCPM-V-4
- SGLang
How to use openbmb/MiniCPM-V-4 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "openbmb/MiniCPM-V-4" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openbmb/MiniCPM-V-4", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "openbmb/MiniCPM-V-4" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openbmb/MiniCPM-V-4", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use openbmb/MiniCPM-V-4 with Docker Model Runner:
docker model run hf.co/openbmb/MiniCPM-V-4
| pipeline_tag: image-text-to-text | |
| datasets: | |
| - openbmb/RLAIF-V-Dataset | |
| library_name: transformers | |
| language: | |
| - multilingual | |
| tags: | |
| - minicpm-v | |
| - vision | |
| - ocr | |
| - multi-image | |
| - video | |
| - custom_code | |
| license: apache-2.0 | |
| <h1>A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone</h1> | |
| [GitHub](https://github.com/OpenBMB/MiniCPM-o) | [Demo](http://211.93.21.133:8889/)</a> | |
| ## MiniCPM-V 4.0 | |
| **MiniCPM-V 4.0** is the latest efficient model in the MiniCPM-V series. The model is built based on SigLIP2-400M and MiniCPM4-3B with a total of 4.1B parameters. It inherits the strong single-image, multi-image and video understanding performance of MiniCPM-V 2.6 with largely improved efficiency. Notable features of MiniCPM-V 4.0 include: | |
| - 🔥 **Leading Visual Capability.** | |
| With only 4.1B parameters, MiniCPM-V 4.0 achieves an average score of 69.0 on OpenCompass, a comprehensive evaluation of 8 popular benchmarks, **outperforming GPT-4.1-mini-20250414, MiniCPM-V 2.6 (8.1B params, OpenCompass 65.2) and Qwen2.5-VL-3B-Instruct (3.8B params, OpenCompass 64.5)**. It also shows good performance in multi-image understanding and video understanding. | |
| - 🚀 **Superior Efficiency.** | |
| Designed for on-device deployment, MiniCPM-V 4.0 runs smoothly on end devices. For example, it devlivers **less than 2s first token delay and more than 17 token/s decoding on iPhone 16 Pro Max**, without heating problems. It also shows superior throughput under concurrent requests. | |
| - 💫 **Easy Usage.** | |
| MiniCPM-V 4.0 can be easily used in various ways including **llama.cpp, Ollama, vLLM, SGLang, LLaMA-Factory and local web demo** etc. We also open-source iOS App that can run on iPhone and iPad. Get started easily with our well-structured [Cookbook](https://github.com/OpenSQZ/MiniCPM-V-CookBook), featuring detailed instructions and practical examples. | |
| ### Evaluation | |
| <details> | |
| <summary>Click to view single image results on OpenCompass. </summary> | |
| <div align="center"> | |
| <table style="margin: 0px auto;"> | |
| <thead> | |
| <tr> | |
| <th nowrap="nowrap" align="left">model</th> | |
| <th>Size</th> | |
| <th>Opencompass</th> | |
| <th>OCRBench</th> | |
| <th>MathVista</th> | |
| <th>HallusionBench</th> | |
| <th>MMMU</th> | |
| <th>MMVet</th> | |
| <th>MMBench V1.1</th> | |
| <th>MMStar</th> | |
| <th>AI2D</th> | |
| </tr> | |
| </thead> | |
| <tbody align="center"> | |
| <tr> | |
| <td colspan="11" align="left"><strong>Proprietary</strong></td> | |
| </tr> | |
| <tr> | |
| <td nowrap="nowrap" align="left">GPT-4v-20240409</td> | |
| <td>-</td> | |
| <td>63.5</td> | |
| <td>656</td> | |
| <td>55.2</td> | |
| <td>43.9</td> | |
| <td>61.7</td> | |
| <td>67.5</td> | |
| <td>79.8</td> | |
| <td>56.0</td> | |
| <td>78.6</td> | |
| </tr> | |
| <tr> | |
| <td nowrap="nowrap" align="left">Gemini-1.5-Pro</td> | |
| <td>-</td> | |
| <td>64.5</td> | |
| <td>754</td> | |
| <td>58.3</td> | |
| <td>45.6</td> | |
| <td>60.6</td> | |
| <td>64.0</td> | |
| <td>73.9</td> | |
| <td>59.1</td> | |
| <td>79.1</td> | |
| </tr> | |
| <tr> | |
| <td nowrap="nowrap" align="left">GPT-4.1-mini-20250414</td> | |
| <td>-</td> | |
| <td>68.9</td> | |
| <td>840</td> | |
| <td>70.9</td> | |
| <td>49.3</td> | |
| <td>55.0</td> | |
| <td>74.3</td> | |
| <td>80.9</td> | |
| <td>60.9</td> | |
| <td>76.0</td> | |
| </tr> | |
| <tr> | |
| <td nowrap="nowrap" align="left">Claude 3.5 Sonnet-20241022</td> | |
| <td>-</td> | |
| <td>70.6</td> | |
| <td>798</td> | |
| <td>65.3</td> | |
| <td>55.5</td> | |
| <td>66.4</td> | |
| <td>70.1</td> | |
| <td>81.7</td> | |
| <td>65.1</td> | |
| <td>81.2</td> | |
| </tr> | |
| <tr> | |
| <td colspan="11" align="left"><strong>Open-source</strong></td> | |
| </tr> | |
| <tr> | |
| <td nowrap="nowrap" align="left">Qwen2.5-VL-3B-Instruct</td> | |
| <td>3.8B</td> | |
| <td>64.5</td> | |
| <td>828</td> | |
| <td>61.2</td> | |
| <td>46.6</td> | |
| <td>51.2</td> | |
| <td>60.0</td> | |
| <td>76.8</td> | |
| <td>56.3</td> | |
| <td>81.4</td> | |
| </tr> | |
| <tr> | |
| <td nowrap="nowrap" align="left">InternVL2.5-4B</td> | |
| <td>3.7B</td> | |
| <td>65.1</td> | |
| <td>820</td> | |
| <td>60.8</td> | |
| <td>46.6</td> | |
| <td>51.8</td> | |
| <td>61.5</td> | |
| <td>78.2</td> | |
| <td>58.7</td> | |
| <td>81.4</td> | |
| </tr> | |
| <tr> | |
| <td nowrap="nowrap" align="left">Qwen2.5-VL-7B-Instruct</td> | |
| <td>8.3B</td> | |
| <td>70.9</td> | |
| <td>888</td> | |
| <td>68.1</td> | |
| <td>51.9</td> | |
| <td>58.0</td> | |
| <td>69.7</td> | |
| <td>82.2</td> | |
| <td>64.1</td> | |
| <td>84.3</td> | |
| </tr> | |
| <tr> | |
| <td nowrap="nowrap" align="left">InternVL2.5-8B</td> | |
| <td>8.1B</td> | |
| <td>68.1</td> | |
| <td>821</td> | |
| <td>64.5</td> | |
| <td>49.0</td> | |
| <td>56.2</td> | |
| <td>62.8</td> | |
| <td>82.5</td> | |
| <td>63.2</td> | |
| <td>84.6</td> | |
| </tr> | |
| <tr> | |
| <td nowrap="nowrap" align="left">MiniCPM-V-2.6</td> | |
| <td>8.1B</td> | |
| <td>65.2</td> | |
| <td>852</td> | |
| <td>60.8</td> | |
| <td>48.1</td> | |
| <td>49.8</td> | |
| <td>60.0</td> | |
| <td>78.0</td> | |
| <td>57.5</td> | |
| <td>82.1</td> | |
| </tr> | |
| <tr> | |
| <td nowrap="nowrap" align="left">MiniCPM-o-2.6</td> | |
| <td>8.7B</td> | |
| <td>70.2</td> | |
| <td>889</td> | |
| <td>73.3</td> | |
| <td>51.1</td> | |
| <td>50.9</td> | |
| <td>67.2</td> | |
| <td>80.6</td> | |
| <td>63.3</td> | |
| <td>86.1</td> | |
| </tr> | |
| <tr> | |
| <td nowrap="nowrap" align="left">MiniCPM-V-4.0</td> | |
| <td>4.1B</td> | |
| <td>69.0</td> | |
| <td>894</td> | |
| <td>66.9</td> | |
| <td>50.8</td> | |
| <td>51.2</td> | |
| <td>68.0</td> | |
| <td>79.7</td> | |
| <td>62.8</td> | |
| <td>82.9</td> | |
| </tr> | |
| </tbody> | |
| </table> | |
| </div> | |
| </details> | |
| <details> | |
| <summary>Click to view single image results on ChartQA, MME, RealWorldQA, TextVQA, DocVQA, MathVision, DynaMath, WeMath, Object HalBench and MM Halbench. </summary> | |
| <div align="center"> | |
| <table style="margin: 0px auto;"> | |
| <thead> | |
| <tr> | |
| <th nowrap="nowrap" align="left">model</th> | |
| <th>Size</th> | |
| <th>ChartQA</th> | |
| <th>MME</th> | |
| <th>RealWorldQA</th> | |
| <th>TextVQA</th> | |
| <th>DocVQA</th> | |
| <th>MathVision</th> | |
| <th>DynaMath</th> | |
| <th>WeMath</th> | |
| <th colspan="2">Obj Hal</th> | |
| <th colspan="2">MM Hal</th> | |
| </tr> | |
| </thead> | |
| <tbody> | |
| <tr> | |
| <td></td> | |
| <td></td> | |
| <td></td> | |
| <td></td> | |
| <td></td> | |
| <td></td> | |
| <td></td> | |
| <td></td> | |
| <td></td> | |
| <td></td> | |
| <td>CHAIRs↓</td> | |
| <td>CHAIRi↓</td> | |
| <td nowrap="nowrap">score avg@3↑</td> | |
| <td nowrap="nowrap">hall rate avg@3↓</td> | |
| </tr> | |
| <tbody align="center"> | |
| <tr> | |
| <td colspan="14" align="left"><strong>Proprietary</strong></td> | |
| </tr> | |
| <tr> | |
| <td nowrap="nowrap" align="left">GPT-4v-20240409</td> | |
| <td>-</td> | |
| <td>78.5</td> | |
| <td>1927</td> | |
| <td>61.4</td> | |
| <td>78.0</td> | |
| <td>88.4</td> | |
| <td>-</td> | |
| <td>-</td> | |
| <td>-</td> | |
| <td>-</td> | |
| <td>-</td> | |
| <td>-</td> | |
| <td>-</td> | |
| </tr> | |
| <tr> | |
| <td nowrap="nowrap" align="left">Gemini-1.5-Pro</td> | |
| <td>-</td> | |
| <td>87.2</td> | |
| <td>-</td> | |
| <td>67.5</td> | |
| <td>78.8</td> | |
| <td>93.1</td> | |
| <td>41.0</td> | |
| <td>31.5</td> | |
| <td>50.5</td> | |
| <td>-</td> | |
| <td>-</td> | |
| <td>-</td> | |
| <td>-</td> | |
| </tr> | |
| <tr> | |
| <td nowrap="nowrap" align="left">GPT-4.1-mini-20250414</td> | |
| <td>-</td> | |
| <td>-</td> | |
| <td>-</td> | |
| <td>-</td> | |
| <td>-</td> | |
| <td>-</td> | |
| <td>45.3</td> | |
| <td>47.7</td> | |
| <td>-</td> | |
| <td>-</td> | |
| <td>-</td> | |
| <td>-</td> | |
| <td>-</td> | |
| </tr> | |
| <tr> | |
| <td nowrap="nowrap" align="left">Claude 3.5 Sonnet-20241022</td> | |
| <td>-</td> | |
| <td>90.8</td> | |
| <td>-</td> | |
| <td>60.1</td> | |
| <td>74.1</td> | |
| <td>95.2</td> | |
| <td>35.6</td> | |
| <td>35.7</td> | |
| <td>44.0</td> | |
| <td>-</td> | |
| <td>-</td> | |
| <td>-</td> | |
| <td>-</td> | |
| </tr> | |
| <tr> | |
| <td colspan="14" align="left"><strong>Open-source</strong></td> | |
| </tr> | |
| <tr> | |
| <td nowrap="nowrap" align="left">Qwen2.5-VL-3B-Instruct</td> | |
| <td>3.8B</td> | |
| <td>84.0</td> | |
| <td>2157</td> | |
| <td>65.4</td> | |
| <td>79.3</td> | |
| <td>93.9</td> | |
| <td>21.9</td> | |
| <td>13.2</td> | |
| <td>22.9</td> | |
| <td>18.3</td> | |
| <td>10.8</td> | |
| <td>3.9 </td> | |
| <td>33.3 </td> | |
| </tr> | |
| <tr> | |
| <td nowrap="nowrap" align="left">InternVL2.5-4B</td> | |
| <td>3.7B</td> | |
| <td>84.0</td> | |
| <td>2338</td> | |
| <td>64.3</td> | |
| <td>76.8</td> | |
| <td>91.6</td> | |
| <td>18.4</td> | |
| <td>15.2</td> | |
| <td>21.2</td> | |
| <td>13.7</td> | |
| <td>8.7</td> | |
| <td>3.2 </td> | |
| <td>46.5 </td> | |
| </tr> | |
| <tr> | |
| <td nowrap="nowrap" align="left">Qwen2.5-VL-7B-Instruct</td> | |
| <td>8.3B</td> | |
| <td>87.3</td> | |
| <td>2347</td> | |
| <td>68.5</td> | |
| <td>84.9</td> | |
| <td>95.7</td> | |
| <td>25.4</td> | |
| <td>21.8</td> | |
| <td>36.2</td> | |
| <td>13.3</td> | |
| <td>7.9</td> | |
| <td>4.1 </td> | |
| <td>31.6 </td> | |
| </tr> | |
| <tr> | |
| <td nowrap="nowrap" align="left">InternVL2.5-8B</td> | |
| <td>8.1B</td> | |
| <td>84.8</td> | |
| <td>2344</td> | |
| <td>70.1</td> | |
| <td>79.1</td> | |
| <td>93.0</td> | |
| <td>17.0</td> | |
| <td>9.4</td> | |
| <td>23.5</td> | |
| <td>18.3</td> | |
| <td>11.6</td> | |
| <td>3.6 </td> | |
| <td>37.2</td> | |
| </tr> | |
| <tr> | |
| <td nowrap="nowrap" align="left">MiniCPM-V-2.6</td> | |
| <td>8.1B</td> | |
| <td>79.4</td> | |
| <td>2348</td> | |
| <td>65.0</td> | |
| <td>80.1</td> | |
| <td>90.8</td> | |
| <td>17.5</td> | |
| <td>9.0</td> | |
| <td>20.4</td> | |
| <td>7.3</td> | |
| <td>4.7</td> | |
| <td>4.0 </td> | |
| <td>29.9 </td> | |
| </tr> | |
| <tr> | |
| <td nowrap="nowrap" align="left">MiniCPM-o-2.6</td> | |
| <td>8.7B</td> | |
| <td>86.9</td> | |
| <td>2372</td> | |
| <td>68.1</td> | |
| <td>82.0</td> | |
| <td>93.5</td> | |
| <td>21.7</td> | |
| <td>10.4</td> | |
| <td>25.2</td> | |
| <td>6.3</td> | |
| <td>3.4</td> | |
| <td>4.1 </td> | |
| <td>31.3 </td> | |
| </tr> | |
| <tr> | |
| <td nowrap="nowrap" align="left">MiniCPM-V-4.0</td> | |
| <td>4.1B</td> | |
| <td>84.4</td> | |
| <td>2298</td> | |
| <td>68.5</td> | |
| <td>80.8</td> | |
| <td>92.9</td> | |
| <td>20.7</td> | |
| <td>14.2</td> | |
| <td>32.7</td> | |
| <td>6.3</td> | |
| <td>3.5</td> | |
| <td>4.1 </td> | |
| <td>29.2 </td> | |
| </tr> | |
| </tbody> | |
| </table> | |
| </div> | |
| </details> | |
| <details> | |
| <summary>Click to view multi-image and video understanding results on Mantis, Blink and Video-MME. </summary> | |
| <div align="center"> | |
| <table style="margin: 0px auto;"> | |
| <thead> | |
| <tr> | |
| <th nowrap="nowrap" align="left">model</th> | |
| <th>Size</th> | |
| <th>Mantis</th> | |
| <th>Blink</th> | |
| <th nowrap="nowrap" colspan="2" >Video-MME</th> | |
| </tr> | |
| </thead> | |
| <tbody> | |
| <tr> | |
| <td></td> | |
| <td></td> | |
| <td></td> | |
| <td></td> | |
| <td>wo subs</td> | |
| <td>w subs</td> | |
| </tr> | |
| <tbody align="center"> | |
| <tr> | |
| <td colspan="6" align="left"><strong>Proprietary</strong></td> | |
| </tr> | |
| <tr> | |
| <td nowrap="nowrap" align="left">GPT-4v-20240409</td> | |
| <td>-</td> | |
| <td>62.7</td> | |
| <td>54.6</td> | |
| <td>59.9</td> | |
| <td>63.3</td> | |
| </tr> | |
| <tr> | |
| <td nowrap="nowrap" align="left">Gemini-1.5-Pro</td> | |
| <td>-</td> | |
| <td>-</td> | |
| <td>59.1</td> | |
| <td>75.0</td> | |
| <td>81.3</td> | |
| </tr> | |
| <tr> | |
| <td nowrap="nowrap" align="left">GPT-4o-20240513</td> | |
| <td>-</td> | |
| <td>-</td> | |
| <td>68.0</td> | |
| <td>71.9</td> | |
| <td>77.2</td> | |
| </tr> | |
| <tr> | |
| <td colspan="6" align="left"><strong>Open-source</strong></td> | |
| </tr> | |
| <tr> | |
| <td nowrap="nowrap" align="left">Qwen2.5-VL-3B-Instruct</td> | |
| <td>3.8B</td> | |
| <td>-</td> | |
| <td>47.6</td> | |
| <td>61.5</td> | |
| <td>67.6</td> | |
| </tr> | |
| <tr> | |
| <td nowrap="nowrap" align="left">InternVL2.5-4B</td> | |
| <td>3.7B</td> | |
| <td>62.7</td> | |
| <td>50.8</td> | |
| <td>62.3</td> | |
| <td>63.6</td> | |
| </tr> | |
| <tr> | |
| <td nowrap="nowrap" align="left">Qwen2.5-VL-7B-Instruct</td> | |
| <td>8.3B</td> | |
| <td>-</td> | |
| <td>56.4</td> | |
| <td>65.1</td> | |
| <td>71.6</td> | |
| </tr> | |
| <tr> | |
| <td nowrap="nowrap" align="left">InternVL2.5-8B</td> | |
| <td>8.1B</td> | |
| <td>67.7</td> | |
| <td>54.8</td> | |
| <td>64.2</td> | |
| <td>66.9</td> | |
| </tr> | |
| <tr> | |
| <td nowrap="nowrap" align="left">MiniCPM-V-2.6</td> | |
| <td>8.1B</td> | |
| <td>69.1</td> | |
| <td>53.0</td> | |
| <td>60.9</td> | |
| <td>63.6</td> | |
| </tr> | |
| <tr> | |
| <td nowrap="nowrap" align="left">MiniCPM-o-2.6</td> | |
| <td>8.7B</td> | |
| <td>71.9</td> | |
| <td>56.7</td> | |
| <td>63.9</td> | |
| <td>69.6</td> | |
| </tr> | |
| <tr> | |
| <td nowrap="nowrap" align="left">MiniCPM-V-4.0</td> | |
| <td>4.1B</td> | |
| <td>71.4</td> | |
| <td>54.0</td> | |
| <td>61.2</td> | |
| <td>65.8</td> | |
| </tr> | |
| </tbody> | |
| </table> | |
| </div> | |
| </details> | |
| ### Examples | |
| <div style="display: flex; flex-direction: column; align-items: center;"> | |
| <img src="https://raw.githubusercontent.com/openbmb/MiniCPM-o/main/assets/minicpmv4/minicpm-v-4-case.png" alt="math" style="margin-bottom: 5px;"> | |
| </div> | |
| Run locally on iPhone 16 Pro Max with [iOS demo](https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/demo/ios_demo/ios.md). | |
| <div align="center"> | |
| <img src="https://raw.githubusercontent.com/openbmb/MiniCPM-o/main/assets/minicpmv4/iphone_en.gif" width="45%" style="display: inline-block; margin: 0 10px;"/> | |
| <img src="https://raw.githubusercontent.com/openbmb/MiniCPM-o/main/assets/minicpmv4/iphone_en_information_extraction.gif" width="45%" style="display: inline-block; margin: 0 10px;"/> | |
| </div> | |
| <div align="center"> | |
| <img src="https://raw.githubusercontent.com/openbmb/MiniCPM-o/main/assets/minicpmv4/iphone_cn.gif" width="45%" style="display: inline-block; margin: 0 10px;"/> | |
| <img src="https://raw.githubusercontent.com/openbmb/MiniCPM-o/main/assets/minicpmv4/iphone_cn_funny_points.gif" width="45%" style="display: inline-block; margin: 0 10px;"/> | |
| </div> | |
| ## Usage | |
| ```python | |
| from PIL import Image | |
| import torch | |
| from transformers import AutoModel, AutoTokenizer | |
| model_path = 'openbmb/MiniCPM-V-4' | |
| model = AutoModel.from_pretrained(model_path, trust_remote_code=True, | |
| # sdpa or flash_attention_2, no eager | |
| attn_implementation='sdpa', torch_dtype=torch.bfloat16) | |
| model = model.eval().cuda() | |
| tokenizer = AutoTokenizer.from_pretrained( | |
| model_path, trust_remote_code=True) | |
| image = Image.open('./assets/single.png').convert('RGB') | |
| # First round chat | |
| question = "What is the landform in the picture?" | |
| msgs = [{'role': 'user', 'content': [image, question]}] | |
| answer = model.chat( | |
| msgs=msgs, | |
| image=image, | |
| tokenizer=tokenizer | |
| ) | |
| print(answer) | |
| # Second round chat, pass history context of multi-turn conversation | |
| msgs.append({"role": "assistant", "content": [answer]}) | |
| msgs.append({"role": "user", "content": [ | |
| "What should I pay attention to when traveling here?"]}) | |
| answer = model.chat( | |
| msgs=msgs, | |
| image=None, | |
| tokenizer=tokenizer | |
| ) | |
| print(answer) | |
| ``` | |
| ## License | |
| #### Model License | |
| * The MiniCPM-o/V model weights and code are open-sourced under the [Apache-2.0](https://github.com/OpenBMB/MiniCPM-V/blob/main/LICENSE) license. | |
| * To help us better understand and support our users, we would deeply appreciate it if you could consider optionally filling out a brief registration ["questionnaire"](https://modelbest.feishu.cn/share/base/form/shrcnpV5ZT9EJ6xYjh3Kx0J6v8g). | |
| #### Statement | |
| * As an LMM, MiniCPM-V 4.0 generates contents by learning a large mount of multimodal corpora, but it cannot comprehend, express personal opinions or make value judgement. Anything generated by MiniCPM-V 4.0 does not represent the views and positions of the model developers | |
| * We will not be liable for any problems arising from the use of the MinCPM-V models, including but not limited to data security issues, risk of public opinion, or any risks and problems arising from the misdirection, misuse, dissemination or misuse of the model. | |
| ## Key Techniques and Other Multimodal Projects | |
| 👏 Welcome to explore key techniques of MiniCPM-V 2.6 and other multimodal projects of our team: | |
| [VisCPM](https://github.com/OpenBMB/VisCPM/tree/main) | [RLHF-V](https://github.com/RLHF-V/RLHF-V) | [LLaVA-UHD](https://github.com/thunlp/LLaVA-UHD) | [RLAIF-V](https://github.com/RLHF-V/RLAIF-V) | |
| ## Citation | |
| If you find our work helpful, please consider citing our papers 📝 and liking this project ❤️! | |
| ```bib | |
| @article{yao2024minicpm, | |
| title={MiniCPM-V: A GPT-4V Level MLLM on Your Phone}, | |
| author={Yao, Yuan and Yu, Tianyu and Zhang, Ao and Wang, Chongyi and Cui, Junbo and Zhu, Hongji and Cai, Tianchi and Li, Haoyu and Zhao, Weilin and He, Zhihui and others}, | |
| journal={Nat Commun 16, 5509 (2025)}, | |
| year={2025} | |
| } | |
| ``` |