File size: 6,138 Bytes
1f3024c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
---
library_name: transformers
license: bsd-3-clause
base_model:
- OpenGVLab/InternVL3-1B
tags:
- InternVL3
- InternVL3-1B
- Int8
- VLM
pipeline_tag: image-text-to-text
language:
- en
---

# InternVL3-1B

This version of InternVL3-1B has been converted to run on the Axera NPU using **w8a16** quantization.

This model has been optimized with the following LoRA: 

Compatible with Pulsar2 version: 4.1

## Convert tools links:

For those who are interested in model conversion, you can try to export axmodel through the original repo : 
https://huggingface.co/OpenGVLab/InternVL3-1B

[How to Convert LLM from Huggingface to axmodel](https://github.com/AXERA-TECH/InternVL3-2B.axera/tree/master/model_convert) 

[AXera NPU HOST LLM Runtime](https://github.com/AXERA-TECH/ax-llm/tree/ax-internvl) 

[AXera NPU AXCL LLM Runtime](https://github.com/AXERA-TECH/ax-llm/tree/axcl-internvl)

## Support Platform

- AX650
  - AX650N DEMO Board
  - [M4N-Dock(爱芯派Pro)](https://wiki.sipeed.com/hardware/zh/maixIV/m4ndock/m4ndock.html)
  - [M.2 Accelerator card](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html)
 
|Chips|image encoder 448|ttft|w8a16|
|--|--|--|--|
|AX650| 380 ms | 623 ms |30 tokens/sec|


## How to use

Download all files from this repository to the device

```
root@ax650:/mnt/qtang/llm-test/internvl3-1b# tree -L 1
.
|-- gradio_demo.py
|-- internvl3_1b_ax650
|-- internvl3_tokenizer
|-- internvl3_tokenizer.py
|-- main_api_ax650
|-- main_api_axcl_x86
|-- main_ax650
|-- main_axcl_x86
|-- post_config.json
|-- run_internvl_3_1b_448_api_ax650.sh
|-- run_internvl_3_1b_448_api_axcl_x86.sh
|-- run_internvl_3_1b_448_ax650.sh
|-- run_internvl_3_1b_448_axcl_x86.sh
`-- ssd_car.jpg
```

#### Install transformer

```
pip install transformers==4.41.1
```

#### Start the Tokenizer service

```
root@ax650:/mnt/qtang/llm-test/internvl3-1b# python3 internvl3_tokenizer.py
None None 151645 <|im_end|> 151665 151667
context_len is  256
prompt is <|im_start|>system
你是书生·万象, 英文名是InternVL, 是由上海人工智能实验室、清华大学及多家合作单位联合开发的多模态大语言模型.<|im_end|>
......
http://0.0.0.0:12345
```

#### Inference with AX650 Host, such as M4N-Dock(爱芯派Pro) or AX650 DEMO Board

- input text

```
描述下图片
```

- input image

![](./ssd_car.jpg)

Open another terminal and run `./run_internvl3_1b_448_ax650.sh`

```
root@ax650:/mnt/qtang/llm-test/internvl3-1b# ./run_internvl_3_1b_448_ax650.sh
[I][                            Init][ 134]: LLM init start
[I][                            Init][  34]: connect http://0.0.0.0:12345 ok
bos_id: -1, eos_id: 151645
img_start_token: 151665
img_context_token: 151667
  3% | ██                                |   1 /  27 [0.01s<0.32s, 83.33 count/s] tokenizer init ok
[I][                            Init][  45]: LLaMaEmbedSelector use mmap
  7% | ███                               |   2 /  27 [0.01s<0.19s, 142.86 count/s] embed_selector init ok
100% | ████████████████████████████████ |  27 /  27 [6.92s<6.92s, 3.90 count/s] init post axmodel ok,remain_cmm(11068 MB)
[I][                            Init][ 226]: IMAGE_CONTEXT_TOKEN: 151667, IMAGE_START_TOKEN: 151665
[I][                            Init][ 251]: image encoder input nchw@float32
[I][                            Init][ 281]: image encoder output float32
[I][                            Init][ 291]: image_encoder_height : 448, image_encoder_width: 448
[I][                            Init][ 293]: max_token_len : 2047
[I][                            Init][ 296]: kv_cache_size : 128, kv_cache_num: 2047
[I][                            Init][ 304]: prefill_token_num : 128
[I][                            Init][ 308]: grp: 1, prefill_max_token_num : 1
[I][                            Init][ 308]: grp: 2, prefill_max_token_num : 128
[I][                            Init][ 308]: grp: 3, prefill_max_token_num : 256
[I][                            Init][ 308]: grp: 4, prefill_max_token_num : 384
[I][                            Init][ 308]: grp: 5, prefill_max_token_num : 512
[I][                            Init][ 308]: grp: 6, prefill_max_token_num : 640
[I][                            Init][ 308]: grp: 7, prefill_max_token_num : 768
[I][                            Init][ 308]: grp: 8, prefill_max_token_num : 896
[I][                            Init][ 308]: grp: 9, prefill_max_token_num : 1024
[I][                            Init][ 312]: prefill_max_token_num : 1024
[I][                     load_config][ 282]: load config:
{
    "enable_repetition_penalty": false,
    "enable_temperature": true,
    "enable_top_k_sampling": true,
    "enable_top_p_sampling": false,
    "penalty_window": 20,
    "repetition_penalty": 1.2,
    "temperature": 0.9,
    "top_k": 10,
    "top_p": 0.8
}

[I][                            Init][ 321]: LLM init ok
Type "q" to exit, Ctrl+c to stop current running
prompt >> 描述下图片
image >> ssd_car.jpg
[I][                          Encode][ 415]: image encode time : 387.35 ms, size : 229376
[I][                          Encode][ 524]: idx:0 offset : 50 out_embed.size() : 279552
[I][                             Run][ 551]: input token num : 312, prefill_split_num : 3
[I][                             Run][ 566]: prefill grpid 4
[I][                             Run][ 593]: input_num_token:128
[I][                             Run][ 593]: input_num_token:128
[I][                             Run][ 593]: input_num_token:56
[I][                             Run][ 717]: ttft: 623.71 ms
图片中出现的物体包括:

1. 一辆红色的双层巴士,巴士上有一则广告,广告上写着“THINGS GET MORE EXCITING WHEN YOU SAY YES” (当你说“是”时,事情就更兴奋了)。
2. 一位微笑的女性站在巴士旁边。
3. 一辆黑色的汽车停在路边。
4. 一家商店的橱窗。
5. 一些建筑物的外墙和窗户。
6. 一根黑色的路灯杆。

这些是图片中实际存在的物体。

[N][                             Run][ 826]: hit eos,avg 28.78 token/s

prompt >> q

```