JustinLin610 commited on
Commit
302e622
·
verified ·
1 Parent(s): 244237f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +55 -7
README.md CHANGED
@@ -1,9 +1,9 @@
1
  ---
2
- license: apache-2.0
3
  language:
4
  - en
5
  pipeline_tag: text-generation
6
  tags:
 
7
  - chat
8
  ---
9
 
@@ -27,18 +27,66 @@ We pretrained the models with a large amount of data, and we post-trained the mo
27
 
28
 
29
  ## Requirements
30
- We advise you to clone [`llama.cpp`](https://github.com/ggerganov/llama.cpp) and install it following the official guide.
 
31
 
32
 
33
  ## How to use
34
  Cloning the repo may be inefficient, and thus you can manually download the GGUF file that you need or use `huggingface-cli` (`pip install huggingface_hub`) as shown below:
35
  ```shell
36
- huggingface-cli download Qwen/Qwen2-72B-Instruct-GGUF qwen2-72b-instruct-q8_0.gguf --local-dir . --local-dir-use-symlinks False
37
  ```
38
 
39
- We demonstrate how to use `llama.cpp` to run Qwen2:
40
- ```shell
41
- ./main -m qwen2-72b-instruct-q8_0.gguf -n 512 --color -i -cml -f prompts/chat-with-qwen.txt
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
  ```
43
 
44
  ## Citation
@@ -50,4 +98,4 @@ If you find our work helpful, feel free to give us a cite.
50
  title={Qwen2 Technical Report},
51
  year={2024}
52
  }
53
- ```
 
1
  ---
 
2
  language:
3
  - en
4
  pipeline_tag: text-generation
5
  tags:
6
+ - instruct
7
  - chat
8
  ---
9
 
 
27
 
28
 
29
  ## Requirements
30
+ We advise you to clone [`llama.cpp`](https://github.com/ggerganov/llama.cpp) and install it following the official guide. We follow the latest version of llama.cpp.
31
+ In the following demonstration, we assume that you are running commands under the repository `llama.cpp`.
32
 
33
 
34
  ## How to use
35
  Cloning the repo may be inefficient, and thus you can manually download the GGUF file that you need or use `huggingface-cli` (`pip install huggingface_hub`) as shown below:
36
  ```shell
37
+ huggingface-cli download Qwen/Qwen2-72B-Instruct-GGUF qwen2-72b-instruct-q4_0.gguf --local-dir . --local-dir-use-symlinks False
38
  ```
39
 
40
+ However, for large files, we split them into multiple segments due to the limitation of 50G for a single file to be uploaded.
41
+ Specifically, for the split files, they share a prefix, with a suffix indicating its index. For examples, the `q5_k_m` GGUF files are:
42
+
43
+ ```
44
+ qwen2-72b-instruct-q5_k_m-00001-of-00002.gguf
45
+ qwen2-72b-instruct-q5_k_m-00002-of-00002.gguf
46
+ ```
47
+
48
+ They share the prefix of `qwen2-72b-instruct-q5_k_m`, but have their own suffix for indexing respectively, say `-00001-of-00002`.
49
+ To use the split GGUF files, you need to merge them first with the command `llama-gguf-split` as shown below:
50
+
51
+ ```bash
52
+ ./llama-gguf-split --merge qwen2-72b-instruct-q5_k_m-00001-of-00002.gguf qwen2-72b-instruct-q5_k_m.gguf
53
+ ```
54
+
55
+ With the upgrade of APIs of llama.cpp, `llama-gguf-split` is equivalent to the previous `gguf-split`.
56
+ For the arguments of this command, the first is the path to the first split GGUF file, and the second is the path to the output GGUF file.
57
+
58
+
59
+ To run Qwen2, you can use `llama-cli` (the previous `main`) or `llama-server` (the previous `server`).
60
+ We recommend using the `llama-server` as it is simple and compatible with OpenAI API. For example:
61
+
62
+ ```bash
63
+ ./llama-server -m qwen2-72b-instruct-q4_0.gguf
64
+ ```
65
+
66
+ Then it is easy to access the deployed service with OpenAI API:
67
+
68
+ ```python
69
+ import openai
70
+
71
+ client = openai.OpenAI(
72
+ base_url="http://localhost:8080/v1", # "http://<Your api-server IP>:port"
73
+ api_key = "sk-no-key-required"
74
+ )
75
+
76
+ completion = client.chat.completions.create(
77
+ model="qwen",
78
+ messages=[
79
+ {"role": "system", "content": "You are a helpful assistant."},
80
+ {"role": "user", "content": "tell me something about michael jordan"}
81
+ ]
82
+ )
83
+ print(completion.choices[0].message.content)
84
+ ```
85
+
86
+ If you choose to use `llama-cli`, pay attention to the removal of `-cml` for the ChatML template. Instead you should use `--in-prefix` and `--in-suffix` to tackle this problem.
87
+
88
+ ```bash
89
+ ./llama-cli -m qwen2-72b-instruct-q4_0.gguf -n 512 -co -i -if -f prompts/chat-with-qwen.txt --in-prefix "<|im_start|>user\n" --in-suffix "<|im_end|>\n<|im_start|>assistant\n"
90
  ```
91
 
92
  ## Citation
 
98
  title={Qwen2 Technical Report},
99
  year={2024}
100
  }
101
+ ```