Update README.md
Browse files
README.md
CHANGED
|
@@ -1,9 +1,9 @@
|
|
| 1 |
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
language:
|
| 4 |
- en
|
| 5 |
pipeline_tag: text-generation
|
| 6 |
tags:
|
|
|
|
| 7 |
- chat
|
| 8 |
---
|
| 9 |
|
|
@@ -27,18 +27,66 @@ We pretrained the models with a large amount of data, and we post-trained the mo
|
|
| 27 |
|
| 28 |
|
| 29 |
## Requirements
|
| 30 |
-
We advise you to clone [`llama.cpp`](https://github.com/ggerganov/llama.cpp) and install it following the official guide.
|
|
|
|
| 31 |
|
| 32 |
|
| 33 |
## How to use
|
| 34 |
Cloning the repo may be inefficient, and thus you can manually download the GGUF file that you need or use `huggingface-cli` (`pip install huggingface_hub`) as shown below:
|
| 35 |
```shell
|
| 36 |
-
huggingface-cli download Qwen/Qwen2-72B-Instruct-GGUF qwen2-72b-instruct-
|
| 37 |
```
|
| 38 |
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
```
|
| 43 |
|
| 44 |
## Citation
|
|
@@ -50,4 +98,4 @@ If you find our work helpful, feel free to give us a cite.
|
|
| 50 |
title={Qwen2 Technical Report},
|
| 51 |
year={2024}
|
| 52 |
}
|
| 53 |
-
```
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
language:
|
| 3 |
- en
|
| 4 |
pipeline_tag: text-generation
|
| 5 |
tags:
|
| 6 |
+
- instruct
|
| 7 |
- chat
|
| 8 |
---
|
| 9 |
|
|
|
|
| 27 |
|
| 28 |
|
| 29 |
## Requirements
|
| 30 |
+
We advise you to clone [`llama.cpp`](https://github.com/ggerganov/llama.cpp) and install it following the official guide. We follow the latest version of llama.cpp.
|
| 31 |
+
In the following demonstration, we assume that you are running commands under the repository `llama.cpp`.
|
| 32 |
|
| 33 |
|
| 34 |
## How to use
|
| 35 |
Cloning the repo may be inefficient, and thus you can manually download the GGUF file that you need or use `huggingface-cli` (`pip install huggingface_hub`) as shown below:
|
| 36 |
```shell
|
| 37 |
+
huggingface-cli download Qwen/Qwen2-72B-Instruct-GGUF qwen2-72b-instruct-q4_0.gguf --local-dir . --local-dir-use-symlinks False
|
| 38 |
```
|
| 39 |
|
| 40 |
+
However, for large files, we split them into multiple segments due to the limitation of 50G for a single file to be uploaded.
|
| 41 |
+
Specifically, for the split files, they share a prefix, with a suffix indicating its index. For examples, the `q5_k_m` GGUF files are:
|
| 42 |
+
|
| 43 |
+
```
|
| 44 |
+
qwen2-72b-instruct-q5_k_m-00001-of-00002.gguf
|
| 45 |
+
qwen2-72b-instruct-q5_k_m-00002-of-00002.gguf
|
| 46 |
+
```
|
| 47 |
+
|
| 48 |
+
They share the prefix of `qwen2-72b-instruct-q5_k_m`, but have their own suffix for indexing respectively, say `-00001-of-00002`.
|
| 49 |
+
To use the split GGUF files, you need to merge them first with the command `llama-gguf-split` as shown below:
|
| 50 |
+
|
| 51 |
+
```bash
|
| 52 |
+
./llama-gguf-split --merge qwen2-72b-instruct-q5_k_m-00001-of-00002.gguf qwen2-72b-instruct-q5_k_m.gguf
|
| 53 |
+
```
|
| 54 |
+
|
| 55 |
+
With the upgrade of APIs of llama.cpp, `llama-gguf-split` is equivalent to the previous `gguf-split`.
|
| 56 |
+
For the arguments of this command, the first is the path to the first split GGUF file, and the second is the path to the output GGUF file.
|
| 57 |
+
|
| 58 |
+
|
| 59 |
+
To run Qwen2, you can use `llama-cli` (the previous `main`) or `llama-server` (the previous `server`).
|
| 60 |
+
We recommend using the `llama-server` as it is simple and compatible with OpenAI API. For example:
|
| 61 |
+
|
| 62 |
+
```bash
|
| 63 |
+
./llama-server -m qwen2-72b-instruct-q4_0.gguf
|
| 64 |
+
```
|
| 65 |
+
|
| 66 |
+
Then it is easy to access the deployed service with OpenAI API:
|
| 67 |
+
|
| 68 |
+
```python
|
| 69 |
+
import openai
|
| 70 |
+
|
| 71 |
+
client = openai.OpenAI(
|
| 72 |
+
base_url="http://localhost:8080/v1", # "http://<Your api-server IP>:port"
|
| 73 |
+
api_key = "sk-no-key-required"
|
| 74 |
+
)
|
| 75 |
+
|
| 76 |
+
completion = client.chat.completions.create(
|
| 77 |
+
model="qwen",
|
| 78 |
+
messages=[
|
| 79 |
+
{"role": "system", "content": "You are a helpful assistant."},
|
| 80 |
+
{"role": "user", "content": "tell me something about michael jordan"}
|
| 81 |
+
]
|
| 82 |
+
)
|
| 83 |
+
print(completion.choices[0].message.content)
|
| 84 |
+
```
|
| 85 |
+
|
| 86 |
+
If you choose to use `llama-cli`, pay attention to the removal of `-cml` for the ChatML template. Instead you should use `--in-prefix` and `--in-suffix` to tackle this problem.
|
| 87 |
+
|
| 88 |
+
```bash
|
| 89 |
+
./llama-cli -m qwen2-72b-instruct-q4_0.gguf -n 512 -co -i -if -f prompts/chat-with-qwen.txt --in-prefix "<|im_start|>user\n" --in-suffix "<|im_end|>\n<|im_start|>assistant\n"
|
| 90 |
```
|
| 91 |
|
| 92 |
## Citation
|
|
|
|
| 98 |
title={Qwen2 Technical Report},
|
| 99 |
year={2024}
|
| 100 |
}
|
| 101 |
+
```
|