Qwen2.5-VL-3B-Instruct

This version of Qwen2.5-VL-3B-Instruct-GPTQ-Int4 has been converted to run on the Axera NPU using w4a16 quantization.

This model has been optimized with the following LoRA:

Compatible with Pulsar2 version: 3.4

Convert tools links:

For those who are interested in model conversion, you can try to export axmodel through the original repo : https://huggingface.co/hfl/Qwen2.5-VL-3B-Instruct-GPTQ-Int4

Pulsar2 Link, How to Convert LLM from Huggingface to axmodel

AXera NPU HOST LLM Runtime

Support Platform

AX650
- AX650N DEMO Board
- M4N-Dock(爱芯派Pro)
- M.2 Accelerator card

Image Process

Chips	input size	image num	image encoder	ttft(320 tokens)	w4a16	DDR	Flash
AX650		1	ms	ms	tokens/sec	GiB	GiB

Video Process

Chips	input size	image num	image encoder	ttft(512 tokens)	w4a16	DDR	Flash
AX650		8	ms	ms	tokens/sec	GiB	GiB

The DDR capacity refers to the CMM memory that needs to be consumed. Ensure that the CMM memory allocation on the development board is greater than this value.

How to use

Download all files from this repository to the device

If you using AX650 Board

Demo Run

Image understand demo

input text

描述下图片

input image

root@ax650:/mnt/qtang/llm-test/qwen2.5-vl-3b# ./run_qwen2_5_vl_image.sh
[I][                            Init][ 129]: LLM init start
bos_id: -1, eos_id: 151645
  2% | █                                 |   1 /  40 [0.01s<0.24s, 166.67 count/s] tokenizer init ok
[I][                            Init][  26]: LLaMaEmbedSelector use mmap
100% | ████████████████████████████████ |  40 /  40 [38.23s<38.23s, 1.05 count/s] init vpm axmodel ok,remain_cmm(7600 MB)
[I][                            Init][ 277]: max_token_len : 1023
[I][                            Init][ 282]: kv_cache_size : 256, kv_cache_num: 1023
[I][                            Init][ 290]: prefill_token_num : 320
[I][                            Init][ 292]: vpm_height : 1024,vpm_width : 392
[I][                            Init][ 301]: LLM init ok
Type "q" to exit, Ctrl+c to stop current running

prompt >> who are you?
image >>
[I][                             Run][ 638]: ttft: 2854.47 ms
I am a large language model created by Alibaba Cloud. I am called Qwen.

[N][                             Run][ 779]: hit eos,avg 6.05 token/s

prompt >> 描述下图片
image >> image/ssd_car.jpg
[I][                          Encode][ 416]: image encode time : 795.614014 ms, size : 524288
[I][                             Run][ 638]: ttft: 2856.88 ms
这张图片展示了一条繁忙的城市街道。前景中，一名女子站在人行道上，她穿着黑色外套，面带微笑。她旁边是一辆红色的双层巴士，巴士上有一个广告，
上面写着“THINGS GET MORE EXITING WHEN YOU SAY ‘YES’”。巴士的车牌号是“L15”。巴士旁边停着一辆黑色的小型货车。背景中可以看到一些商店和行人，
街道两旁的建筑物是现代的玻璃幕墙建筑。整体氛围显得繁忙而充满活力。

[N][                             Run][ 779]: hit eos,avg 5.96 token/s

Video understand demo

Please pre-process the image of the video file into a 308x308 size picture

root@ax650:/mnt/qtang/llm-test/qwen2.5-vl-3b# ./run_qwen2_5_vl_video.sh
[I][                            Init][ 129]: LLM init start
bos_id: -1, eos_id: 151645
  2% | █                                 |   1 /  40 [0.00s<0.12s, 333.33 count/s] tokenizer init ok
[I][                            Init][  26]: LLaMaEmbedSelector use mmap
100% | ████████████████████████████████ |  40 /  40 [40.05s<40.05s, 1.00 count/s] init vpm axmodel ok,remain_cmm(7680 MB)
[I][                            Init][ 277]: max_token_len : 1023
[I][                            Init][ 282]: kv_cache_size : 256, kv_cache_num: 1023
[I][                            Init][ 290]: prefill_token_num : 512
[I][                            Init][ 292]: vpm_height : 484,vpm_width : 392
[I][                            Init][ 301]: LLM init ok
Type "q" to exit, Ctrl+c to stop current running

prompt >> 描述下视频
image >> video
video/frame_0000.jpg
video/frame_0008.jpg
video/frame_0016.jpg
video/frame_0024.jpg
video/frame_0032.jpg
video/frame_0040.jpg
video/frame_0048.jpg
video/frame_0056.jpg
[I][                          Encode][ 416]: image encode time : 1487.557007 ms, size : 991232
[I][                             Run][ 638]: ttft: 5488.29 ms
视频展示了两只松鼠在户外的场景。背景是模糊的山脉和蓝天，前景中有松鼠在互动。松鼠的毛色主要是棕色和白色，它们的爪子是橙色的。松鼠似乎在互相玩耍或争抢，它们的爪子和嘴巴都伸向对方。整个场景显得非常自然和生动。

Downloads last month: 13

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AXERA-TECH/Qwen2.5-VL-3B-Instruct-GPTQ-Int4

Base model

Qwen/Qwen2.5-VL-3B-Instruct

Quantized

hfl/Qwen2.5-VL-3B-Instruct-GPTQ-Int4

Finetuned

(1)

this model