qaihm-bot commited on
Commit
4259c9f
·
verified ·
1 Parent(s): 1ddf90e

See https://github.com/quic/ai-hub-models/releases/v0.38.0 for changelog.

.gitattributes CHANGED
@@ -33,3 +33,7 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ DEPLOYMENT_MODEL_LICENSE.pdf filter=lfs diff=lfs merge=lfs -text
37
+ GPUNet_float.dlc filter=lfs diff=lfs merge=lfs -text
38
+ GPUNet_w8a16.dlc filter=lfs diff=lfs merge=lfs -text
39
+ GPUNet_w8a8.dlc filter=lfs diff=lfs merge=lfs -text
DEPLOYMENT_MODEL_LICENSE.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4409f93b0e82531303b3e10f52f1fdfb56467a25f05b7441c6bbd8bb8a64b42c
3
+ size 109629
GPUNet_float.dlc ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:17066c2e9483a74b11fc8a8fc3f655e44ee0d7b992dc1213ad0aba655912cf6f
3
+ size 47562108
GPUNet_float.onnx.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e10d156ec2d77240e21904c9c2923b99013af18a36aeba1ba05099d833b95a67
3
+ size 44302131
GPUNet_float.tflite ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bb4457a6ee3d8d2336fa2a3c4aaf0851d9ef8915df5193e9929099bec8a1ce24
3
+ size 47478880
GPUNet_w8a16.dlc ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b1401e2460af5f0cbf474714bcb602f6c8a0bb322f818cf5abc84d023586e6c4
3
+ size 13468924
GPUNet_w8a8.dlc ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ee1f52b3d7ed4a3277d093b6aad8f34b0eee5cd1e1b2d2fb5f3c283b45ac3e30
3
+ size 13468916
GPUNet_w8a8.onnx.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:947ef9719180ce242de1e97a9b66fc1abc59bf97bb7b1bdd60c02c9b0c1bc333
3
+ size 22332441
GPUNet_w8a8.tflite ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:96acf239e41c3ec788337b535506979787545f1c2d8a696ac2af4e3154d5c36c
3
+ size 12578264
LICENSE ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ The license of the original trained model can be found at http://www.apache.org/licenses/LICENSE-2.0.
2
+ The license for the deployable model files (.tflite, .onnx, .dlc, .bin, etc.) can be found in DEPLOYMENT_MODEL_LICENSE.pdf.
README.md ADDED
@@ -0,0 +1,287 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: pytorch
3
+ license: other
4
+ tags:
5
+ - backbone
6
+ - android
7
+ pipeline_tag: image-classification
8
+
9
+ ---
10
+
11
+ ![](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/gpunet/web-assets/model_demo.png)
12
+
13
+ # GPUNet: Optimized for Mobile Deployment
14
+ ## Imagenet classifier and general purpose backbone
15
+
16
+
17
+ GPUNet is a machine learning model that can classify images from the Imagenet dataset. It can also be used as a backbone in building more complex models for specific use cases.
18
+
19
+ This model is an implementation of GPUNet found [here](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Classification/GPUNet).
20
+
21
+
22
+ This repository provides scripts to run GPUNet on Qualcomm® devices.
23
+ More details on model performance across various devices, can be found
24
+ [here](https://aihub.qualcomm.com/models/gpunet).
25
+
26
+
27
+
28
+ ### Model Details
29
+
30
+ - **Model Type:** Model_use_case.image_classification
31
+ - **Model Stats:**
32
+ - Model checkpoint: Imagenet
33
+ - Input resolution: 224x224
34
+ - Number of parameters: 10.49M
35
+ - Model size (float): 45.28MB
36
+ - Model size (w8a8): 21.3MB
37
+
38
+ | Model | Precision | Device | Chipset | Target Runtime | Inference Time (ms) | Peak Memory Range (MB) | Primary Compute Unit | Target Model
39
+ |---|---|---|---|---|---|---|---|---|
40
+ | GPUNet | float | QCS8275 (Proxy) | Qualcomm® QCS8275 (Proxy) | TFLITE | 4.661 ms | 0 - 50 MB | NPU | [GPUNet.tflite](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet.tflite) |
41
+ | GPUNet | float | QCS8275 (Proxy) | Qualcomm® QCS8275 (Proxy) | QNN_DLC | 4.579 ms | 0 - 22 MB | NPU | [GPUNet.dlc](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet.dlc) |
42
+ | GPUNet | float | QCS8450 (Proxy) | Qualcomm® QCS8450 (Proxy) | TFLITE | 1.777 ms | 0 - 63 MB | NPU | [GPUNet.tflite](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet.tflite) |
43
+ | GPUNet | float | QCS8450 (Proxy) | Qualcomm® QCS8450 (Proxy) | QNN_DLC | 2.242 ms | 1 - 33 MB | NPU | [GPUNet.dlc](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet.dlc) |
44
+ | GPUNet | float | QCS8550 (Proxy) | Qualcomm® QCS8550 (Proxy) | TFLITE | 1.231 ms | 0 - 179 MB | NPU | [GPUNet.tflite](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet.tflite) |
45
+ | GPUNet | float | QCS8550 (Proxy) | Qualcomm® QCS8550 (Proxy) | QNN_DLC | 1.242 ms | 0 - 70 MB | NPU | [GPUNet.dlc](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet.dlc) |
46
+ | GPUNet | float | QCS8550 (Proxy) | Qualcomm® QCS8550 (Proxy) | ONNX | 1.212 ms | 0 - 112 MB | NPU | [GPUNet.onnx.zip](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet.onnx.zip) |
47
+ | GPUNet | float | QCS9075 (Proxy) | Qualcomm® QCS9075 (Proxy) | TFLITE | 1.653 ms | 0 - 50 MB | NPU | [GPUNet.tflite](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet.tflite) |
48
+ | GPUNet | float | QCS9075 (Proxy) | Qualcomm® QCS9075 (Proxy) | QNN_DLC | 1.702 ms | 0 - 22 MB | NPU | [GPUNet.dlc](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet.dlc) |
49
+ | GPUNet | float | SA7255P ADP | Qualcomm® SA7255P | TFLITE | 4.661 ms | 0 - 50 MB | NPU | [GPUNet.tflite](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet.tflite) |
50
+ | GPUNet | float | SA7255P ADP | Qualcomm® SA7255P | QNN_DLC | 4.579 ms | 0 - 22 MB | NPU | [GPUNet.dlc](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet.dlc) |
51
+ | GPUNet | float | SA8255 (Proxy) | Qualcomm® SA8255P (Proxy) | TFLITE | 1.223 ms | 0 - 181 MB | NPU | [GPUNet.tflite](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet.tflite) |
52
+ | GPUNet | float | SA8255 (Proxy) | Qualcomm® SA8255P (Proxy) | QNN_DLC | 1.244 ms | 0 - 83 MB | NPU | [GPUNet.dlc](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet.dlc) |
53
+ | GPUNet | float | SA8295P ADP | Qualcomm® SA8295P | TFLITE | 2.212 ms | 0 - 57 MB | NPU | [GPUNet.tflite](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet.tflite) |
54
+ | GPUNet | float | SA8295P ADP | Qualcomm® SA8295P | QNN_DLC | 2.207 ms | 0 - 29 MB | NPU | [GPUNet.dlc](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet.dlc) |
55
+ | GPUNet | float | SA8650 (Proxy) | Qualcomm® SA8650P (Proxy) | TFLITE | 1.227 ms | 0 - 180 MB | NPU | [GPUNet.tflite](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet.tflite) |
56
+ | GPUNet | float | SA8650 (Proxy) | Qualcomm® SA8650P (Proxy) | QNN_DLC | 1.242 ms | 0 - 84 MB | NPU | [GPUNet.dlc](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet.dlc) |
57
+ | GPUNet | float | SA8775P ADP | Qualcomm® SA8775P | TFLITE | 1.653 ms | 0 - 50 MB | NPU | [GPUNet.tflite](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet.tflite) |
58
+ | GPUNet | float | SA8775P ADP | Qualcomm® SA8775P | QNN_DLC | 1.702 ms | 0 - 22 MB | NPU | [GPUNet.dlc](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet.dlc) |
59
+ | GPUNet | float | Samsung Galaxy S24 | Snapdragon® 8 Gen 3 Mobile | TFLITE | 0.876 ms | 0 - 65 MB | NPU | [GPUNet.tflite](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet.tflite) |
60
+ | GPUNet | float | Samsung Galaxy S24 | Snapdragon® 8 Gen 3 Mobile | QNN_DLC | 0.889 ms | 1 - 34 MB | NPU | [GPUNet.dlc](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet.dlc) |
61
+ | GPUNet | float | Samsung Galaxy S24 | Snapdragon® 8 Gen 3 Mobile | ONNX | 0.879 ms | 0 - 35 MB | NPU | [GPUNet.onnx.zip](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet.onnx.zip) |
62
+ | GPUNet | float | Samsung Galaxy S25 | Snapdragon® 8 Elite For Galaxy Mobile | TFLITE | 0.69 ms | 0 - 56 MB | NPU | [GPUNet.tflite](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet.tflite) |
63
+ | GPUNet | float | Samsung Galaxy S25 | Snapdragon® 8 Elite For Galaxy Mobile | QNN_DLC | 0.696 ms | 0 - 28 MB | NPU | [GPUNet.dlc](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet.dlc) |
64
+ | GPUNet | float | Samsung Galaxy S25 | Snapdragon® 8 Elite For Galaxy Mobile | ONNX | 0.712 ms | 0 - 28 MB | NPU | [GPUNet.onnx.zip](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet.onnx.zip) |
65
+ | GPUNet | float | Snapdragon X Elite CRD | Snapdragon® X Elite | QNN_DLC | 1.332 ms | 112 - 112 MB | NPU | [GPUNet.dlc](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet.dlc) |
66
+ | GPUNet | float | Snapdragon X Elite CRD | Snapdragon® X Elite | ONNX | 1.112 ms | 24 - 24 MB | NPU | [GPUNet.onnx.zip](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet.onnx.zip) |
67
+ | GPUNet | w8a16 | QCS8275 (Proxy) | Qualcomm® QCS8275 (Proxy) | QNN_DLC | 2.445 ms | 0 - 29 MB | NPU | [GPUNet.dlc](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet_w8a16.dlc) |
68
+ | GPUNet | w8a16 | QCS8450 (Proxy) | Qualcomm® QCS8450 (Proxy) | QNN_DLC | 1.493 ms | 0 - 41 MB | NPU | [GPUNet.dlc](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet_w8a16.dlc) |
69
+ | GPUNet | w8a16 | QCS8550 (Proxy) | Qualcomm® QCS8550 (Proxy) | QNN_DLC | 1.068 ms | 0 - 7 MB | NPU | [GPUNet.dlc](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet_w8a16.dlc) |
70
+ | GPUNet | w8a16 | QCS9075 (Proxy) | Qualcomm® QCS9075 (Proxy) | QNN_DLC | 1.297 ms | 0 - 29 MB | NPU | [GPUNet.dlc](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet_w8a16.dlc) |
71
+ | GPUNet | w8a16 | RB3 Gen 2 (Proxy) | Qualcomm® QCS6490 (Proxy) | QNN_DLC | 4.0 ms | 0 - 41 MB | NPU | [GPUNet.dlc](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet_w8a16.dlc) |
72
+ | GPUNet | w8a16 | SA7255P ADP | Qualcomm® SA7255P | QNN_DLC | 2.445 ms | 0 - 29 MB | NPU | [GPUNet.dlc](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet_w8a16.dlc) |
73
+ | GPUNet | w8a16 | SA8255 (Proxy) | Qualcomm® SA8255P (Proxy) | QNN_DLC | 1.076 ms | 0 - 58 MB | NPU | [GPUNet.dlc](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet_w8a16.dlc) |
74
+ | GPUNet | w8a16 | SA8295P ADP | Qualcomm® SA8295P | QNN_DLC | 1.642 ms | 0 - 36 MB | NPU | [GPUNet.dlc](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet_w8a16.dlc) |
75
+ | GPUNet | w8a16 | SA8650 (Proxy) | Qualcomm® SA8650P (Proxy) | QNN_DLC | 1.072 ms | 0 - 61 MB | NPU | [GPUNet.dlc](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet_w8a16.dlc) |
76
+ | GPUNet | w8a16 | SA8775P ADP | Qualcomm® SA8775P | QNN_DLC | 1.297 ms | 0 - 29 MB | NPU | [GPUNet.dlc](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet_w8a16.dlc) |
77
+ | GPUNet | w8a16 | Samsung Galaxy S24 | Snapdragon® 8 Gen 3 Mobile | QNN_DLC | 0.755 ms | 0 - 37 MB | NPU | [GPUNet.dlc](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet_w8a16.dlc) |
78
+ | GPUNet | w8a16 | Samsung Galaxy S25 | Snapdragon® 8 Elite For Galaxy Mobile | QNN_DLC | 0.516 ms | 0 - 33 MB | NPU | [GPUNet.dlc](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet_w8a16.dlc) |
79
+ | GPUNet | w8a16 | Snapdragon X Elite CRD | Snapdragon® X Elite | QNN_DLC | 1.247 ms | 59 - 59 MB | NPU | [GPUNet.dlc](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet_w8a16.dlc) |
80
+ | GPUNet | w8a8 | QCS8275 (Proxy) | Qualcomm® QCS8275 (Proxy) | TFLITE | 1.107 ms | 0 - 29 MB | NPU | [GPUNet.tflite](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet_w8a8.tflite) |
81
+ | GPUNet | w8a8 | QCS8275 (Proxy) | Qualcomm® QCS8275 (Proxy) | QNN_DLC | 1.4 ms | 0 - 29 MB | NPU | [GPUNet.dlc](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet_w8a8.dlc) |
82
+ | GPUNet | w8a8 | QCS8450 (Proxy) | Qualcomm® QCS8450 (Proxy) | TFLITE | 0.572 ms | 0 - 43 MB | NPU | [GPUNet.tflite](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet_w8a8.tflite) |
83
+ | GPUNet | w8a8 | QCS8450 (Proxy) | Qualcomm® QCS8450 (Proxy) | QNN_DLC | 0.86 ms | 0 - 41 MB | NPU | [GPUNet.dlc](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet_w8a8.dlc) |
84
+ | GPUNet | w8a8 | QCS8550 (Proxy) | Qualcomm® QCS8550 (Proxy) | TFLITE | 0.434 ms | 0 - 61 MB | NPU | [GPUNet.tflite](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet_w8a8.tflite) |
85
+ | GPUNet | w8a8 | QCS8550 (Proxy) | Qualcomm® QCS8550 (Proxy) | QNN_DLC | 0.614 ms | 0 - 63 MB | NPU | [GPUNet.dlc](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet_w8a8.dlc) |
86
+ | GPUNet | w8a8 | QCS8550 (Proxy) | Qualcomm® QCS8550 (Proxy) | ONNX | 91.246 ms | 36 - 273 MB | NPU | [GPUNet.onnx.zip](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet_w8a8.onnx.zip) |
87
+ | GPUNet | w8a8 | QCS9075 (Proxy) | Qualcomm® QCS9075 (Proxy) | TFLITE | 0.64 ms | 0 - 29 MB | NPU | [GPUNet.tflite](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet_w8a8.tflite) |
88
+ | GPUNet | w8a8 | QCS9075 (Proxy) | Qualcomm® QCS9075 (Proxy) | QNN_DLC | 0.802 ms | 0 - 29 MB | NPU | [GPUNet.dlc](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet_w8a8.dlc) |
89
+ | GPUNet | w8a8 | RB3 Gen 2 (Proxy) | Qualcomm® QCS6490 (Proxy) | TFLITE | 1.594 ms | 0 - 38 MB | NPU | [GPUNet.tflite](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet_w8a8.tflite) |
90
+ | GPUNet | w8a8 | RB3 Gen 2 (Proxy) | Qualcomm® QCS6490 (Proxy) | QNN_DLC | 2.243 ms | 0 - 37 MB | NPU | [GPUNet.dlc](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet_w8a8.dlc) |
91
+ | GPUNet | w8a8 | RB3 Gen 2 (Proxy) | Qualcomm® QCS6490 (Proxy) | ONNX | 31.065 ms | 16 - 29 MB | CPU | [GPUNet.onnx.zip](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet_w8a8.onnx.zip) |
92
+ | GPUNet | w8a8 | RB5 (Proxy) | Qualcomm® QCS8250 (Proxy) | TFLITE | 7.896 ms | 0 - 3 MB | NPU | [GPUNet.tflite](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet_w8a8.tflite) |
93
+ | GPUNet | w8a8 | RB5 (Proxy) | Qualcomm® QCS8250 (Proxy) | ONNX | 32.992 ms | 15 - 30 MB | CPU | [GPUNet.onnx.zip](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet_w8a8.onnx.zip) |
94
+ | GPUNet | w8a8 | SA7255P ADP | Qualcomm® SA7255P | TFLITE | 1.107 ms | 0 - 29 MB | NPU | [GPUNet.tflite](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet_w8a8.tflite) |
95
+ | GPUNet | w8a8 | SA7255P ADP | Qualcomm® SA7255P | QNN_DLC | 1.4 ms | 0 - 29 MB | NPU | [GPUNet.dlc](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet_w8a8.dlc) |
96
+ | GPUNet | w8a8 | SA8255 (Proxy) | Qualcomm® SA8255P (Proxy) | TFLITE | 0.44 ms | 0 - 63 MB | NPU | [GPUNet.tflite](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet_w8a8.tflite) |
97
+ | GPUNet | w8a8 | SA8255 (Proxy) | Qualcomm® SA8255P (Proxy) | QNN_DLC | 0.634 ms | 0 - 61 MB | NPU | [GPUNet.dlc](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet_w8a8.dlc) |
98
+ | GPUNet | w8a8 | SA8295P ADP | Qualcomm® SA8295P | TFLITE | 0.851 ms | 0 - 35 MB | NPU | [GPUNet.tflite](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet_w8a8.tflite) |
99
+ | GPUNet | w8a8 | SA8295P ADP | Qualcomm® SA8295P | QNN_DLC | 1.042 ms | 0 - 35 MB | NPU | [GPUNet.dlc](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet_w8a8.dlc) |
100
+ | GPUNet | w8a8 | SA8650 (Proxy) | Qualcomm® SA8650P (Proxy) | TFLITE | 0.439 ms | 0 - 63 MB | NPU | [GPUNet.tflite](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet_w8a8.tflite) |
101
+ | GPUNet | w8a8 | SA8650 (Proxy) | Qualcomm® SA8650P (Proxy) | QNN_DLC | 0.618 ms | 0 - 62 MB | NPU | [GPUNet.dlc](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet_w8a8.dlc) |
102
+ | GPUNet | w8a8 | SA8775P ADP | Qualcomm® SA8775P | TFLITE | 0.64 ms | 0 - 29 MB | NPU | [GPUNet.tflite](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet_w8a8.tflite) |
103
+ | GPUNet | w8a8 | SA8775P ADP | Qualcomm® SA8775P | QNN_DLC | 0.802 ms | 0 - 29 MB | NPU | [GPUNet.dlc](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet_w8a8.dlc) |
104
+ | GPUNet | w8a8 | Samsung Galaxy S24 | Snapdragon® 8 Gen 3 Mobile | TFLITE | 0.341 ms | 0 - 40 MB | NPU | [GPUNet.tflite](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet_w8a8.tflite) |
105
+ | GPUNet | w8a8 | Samsung Galaxy S24 | Snapdragon® 8 Gen 3 Mobile | QNN_DLC | 0.468 ms | 0 - 43 MB | NPU | [GPUNet.dlc](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet_w8a8.dlc) |
106
+ | GPUNet | w8a8 | Samsung Galaxy S24 | Snapdragon® 8 Gen 3 Mobile | ONNX | 68.17 ms | 27 - 1478 MB | NPU | [GPUNet.onnx.zip](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet_w8a8.onnx.zip) |
107
+ | GPUNet | w8a8 | Samsung Galaxy S25 | Snapdragon® 8 Elite For Galaxy Mobile | TFLITE | 0.27 ms | 0 - 31 MB | NPU | [GPUNet.tflite](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet_w8a8.tflite) |
108
+ | GPUNet | w8a8 | Samsung Galaxy S25 | Snapdragon® 8 Elite For Galaxy Mobile | QNN_DLC | 0.333 ms | 0 - 36 MB | NPU | [GPUNet.dlc](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet_w8a8.dlc) |
109
+ | GPUNet | w8a8 | Samsung Galaxy S25 | Snapdragon® 8 Elite For Galaxy Mobile | ONNX | 60.449 ms | 41 - 1347 MB | NPU | [GPUNet.onnx.zip](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet_w8a8.onnx.zip) |
110
+ | GPUNet | w8a8 | Snapdragon X Elite CRD | Snapdragon® X Elite | QNN_DLC | 0.721 ms | 64 - 64 MB | NPU | [GPUNet.dlc](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet_w8a8.dlc) |
111
+ | GPUNet | w8a8 | Snapdragon X Elite CRD | Snapdragon® X Elite | ONNX | 79.755 ms | 50 - 50 MB | NPU | [GPUNet.onnx.zip](https://huggingface.co/qualcomm/GPUNet/blob/main/GPUNet_w8a8.onnx.zip) |
112
+
113
+
114
+
115
+
116
+ ## Installation
117
+
118
+
119
+ Install the package via pip:
120
+ ```bash
121
+ pip install qai-hub-models
122
+ ```
123
+
124
+
125
+ ## Configure Qualcomm® AI Hub to run this model on a cloud-hosted device
126
+
127
+ Sign-in to [Qualcomm® AI Hub](https://app.aihub.qualcomm.com/) with your
128
+ Qualcomm® ID. Once signed in navigate to `Account -> Settings -> API Token`.
129
+
130
+ With this API token, you can configure your client to run models on the cloud
131
+ hosted devices.
132
+ ```bash
133
+ qai-hub configure --api_token API_TOKEN
134
+ ```
135
+ Navigate to [docs](https://app.aihub.qualcomm.com/docs/) for more information.
136
+
137
+
138
+
139
+ ## Demo off target
140
+
141
+ The package contains a simple end-to-end demo that downloads pre-trained
142
+ weights and runs this model on a sample input.
143
+
144
+ ```bash
145
+ python -m qai_hub_models.models.gpunet.demo
146
+ ```
147
+
148
+ The above demo runs a reference implementation of pre-processing, model
149
+ inference, and post processing.
150
+
151
+ **NOTE**: If you want running in a Jupyter Notebook or Google Colab like
152
+ environment, please add the following to your cell (instead of the above).
153
+ ```
154
+ %run -m qai_hub_models.models.gpunet.demo
155
+ ```
156
+
157
+
158
+ ### Run model on a cloud-hosted device
159
+
160
+ In addition to the demo, you can also run the model on a cloud-hosted Qualcomm®
161
+ device. This script does the following:
162
+ * Performance check on-device on a cloud-hosted device
163
+ * Downloads compiled assets that can be deployed on-device for Android.
164
+ * Accuracy check between PyTorch and on-device outputs.
165
+
166
+ ```bash
167
+ python -m qai_hub_models.models.gpunet.export
168
+ ```
169
+
170
+
171
+
172
+ ## How does this work?
173
+
174
+ This [export script](https://aihub.qualcomm.com/models/gpunet/qai_hub_models/models/GPUNet/export.py)
175
+ leverages [Qualcomm® AI Hub](https://aihub.qualcomm.com/) to optimize, validate, and deploy this model
176
+ on-device. Lets go through each step below in detail:
177
+
178
+ Step 1: **Compile model for on-device deployment**
179
+
180
+ To compile a PyTorch model for on-device deployment, we first trace the model
181
+ in memory using the `jit.trace` and then call the `submit_compile_job` API.
182
+
183
+ ```python
184
+ import torch
185
+
186
+ import qai_hub as hub
187
+ from qai_hub_models.models.gpunet import Model
188
+
189
+ # Load the model
190
+ torch_model = Model.from_pretrained()
191
+
192
+ # Device
193
+ device = hub.Device("Samsung Galaxy S25")
194
+
195
+ # Trace model
196
+ input_shape = torch_model.get_input_spec()
197
+ sample_inputs = torch_model.sample_inputs()
198
+
199
+ pt_model = torch.jit.trace(torch_model, [torch.tensor(data[0]) for _, data in sample_inputs.items()])
200
+
201
+ # Compile model on a specific device
202
+ compile_job = hub.submit_compile_job(
203
+ model=pt_model,
204
+ device=device,
205
+ input_specs=torch_model.get_input_spec(),
206
+ )
207
+
208
+ # Get target model to run on-device
209
+ target_model = compile_job.get_target_model()
210
+
211
+ ```
212
+
213
+
214
+ Step 2: **Performance profiling on cloud-hosted device**
215
+
216
+ After compiling models from step 1. Models can be profiled model on-device using the
217
+ `target_model`. Note that this scripts runs the model on a device automatically
218
+ provisioned in the cloud. Once the job is submitted, you can navigate to a
219
+ provided job URL to view a variety of on-device performance metrics.
220
+ ```python
221
+ profile_job = hub.submit_profile_job(
222
+ model=target_model,
223
+ device=device,
224
+ )
225
+
226
+ ```
227
+
228
+ Step 3: **Verify on-device accuracy**
229
+
230
+ To verify the accuracy of the model on-device, you can run on-device inference
231
+ on sample input data on the same cloud hosted device.
232
+ ```python
233
+ input_data = torch_model.sample_inputs()
234
+ inference_job = hub.submit_inference_job(
235
+ model=target_model,
236
+ device=device,
237
+ inputs=input_data,
238
+ )
239
+ on_device_output = inference_job.download_output_data()
240
+
241
+ ```
242
+ With the output of the model, you can compute like PSNR, relative errors or
243
+ spot check the output with expected output.
244
+
245
+ **Note**: This on-device profiling and inference requires access to Qualcomm®
246
+ AI Hub. [Sign up for access](https://myaccount.qualcomm.com/signup).
247
+
248
+
249
+
250
+
251
+ ## Deploying compiled model to Android
252
+
253
+
254
+ The models can be deployed using multiple runtimes:
255
+ - TensorFlow Lite (`.tflite` export): [This
256
+ tutorial](https://www.tensorflow.org/lite/android/quickstart) provides a
257
+ guide to deploy the .tflite model in an Android application.
258
+
259
+
260
+ - QNN (`.so` export ): This [sample
261
+ app](https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/sample_app.html)
262
+ provides instructions on how to use the `.so` shared library in an Android application.
263
+
264
+
265
+ ## View on Qualcomm® AI Hub
266
+ Get more details on GPUNet's performance across various devices [here](https://aihub.qualcomm.com/models/gpunet).
267
+ Explore all available models on [Qualcomm® AI Hub](https://aihub.qualcomm.com/)
268
+
269
+
270
+ ## License
271
+ * The license for the original implementation of GPUNet can be found
272
+ [here](http://www.apache.org/licenses/LICENSE-2.0).
273
+ * The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf)
274
+
275
+
276
+
277
+ ## References
278
+ * [GPUNet: Searching the Deployable Convolution Neural Networks for GPUs](https://arxiv.org/abs/2205.00841)
279
+ * [Source Model Implementation](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Classification/GPUNet)
280
+
281
+
282
+
283
+ ## Community
284
+ * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI.
285
+ * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com).
286
+
287
+
tool-versions.yaml ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ tool_versions:
2
+ onnx:
3
+ qairt: 2.37.1.250807093845_124904
4
+ onnx_runtime: 1.22.2