En-2863 commited on
Commit
15ccb52
Β·
verified Β·
1 Parent(s): 17603d2

Upload folder using huggingface_hub

Browse files
Files changed (2) hide show
  1. README.md +57 -3
  2. metadata.json +11 -0
README.md CHANGED
@@ -1,3 +1,57 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # JITServe QRF Length Predictor
2
+
3
+ This repository provides the **pretrained QRF (Quantile Regression Forest) length predictor**
4
+ used by **[JITServe (NSDI’26)](https://arxiv.org/abs/2504.20068)** to estimate conservative upper bounds on LLM output lengths.
5
+
6
+ This predictor is:
7
+ - **Not an LLM evaluation model**
8
+ - **Not fine-tuned during inference**
9
+ - A lightweight **offline-trained prediction model** used solely for scheduling decisions
10
+
11
+ It is released to ensure **full reproducibility** of the JITServe artifact.
12
+
13
+ ---
14
+
15
+ ## What Is Included
16
+
17
+ This repository contains two components that must be used together:
18
+
19
+ ```text
20
+ qrf_model/
21
+ β”œβ”€β”€ 0_qrf_lmsys_chat_llama3_8b.pkl
22
+ └── 0_qrf_lmsys_chat_qwen25_7b.pkl
23
+
24
+ qrf_vectorizer/
25
+ β”œβ”€β”€ 0_qrf_lmsys_chat_llama3_8b.pkl
26
+ └── 0_qrf_lmsys_chat_qwen25_7b.pkl
27
+ ```
28
+
29
+ ## Usage
30
+
31
+ These artifacts are consumed by JITServe at runtime.
32
+
33
+ Expected directory layout in the JITServe artifact:
34
+ ```
35
+ assets/qrf/
36
+ β”œβ”€β”€ qrf_model/
37
+ └── qrf_vectorizer/
38
+ ```
39
+
40
+ After downloading this repository, place its contents under the path above.
41
+
42
+ JITServe loads the predictor automatically during startup and does not require
43
+ any additional configuration by default.
44
+
45
+ ## Citation
46
+ If you use these artifacts, please consider to cite our paper:
47
+ ```
48
+ @misc{zhang2025jitservesloawarellmserving,
49
+ title={JITServe: SLO-aware LLM Serving with Imprecise Request Information},
50
+ author={Wei Zhang and Zhiyu Wu and Yi Mu and Rui Ning and Banruo Liu and Nikhil Sarda and Myungjin Lee and Fan Lai},
51
+ year={2025},
52
+ eprint={2504.20068},
53
+ archivePrefix={arXiv},
54
+ primaryClass={cs.DC},
55
+ url={https://arxiv.org/abs/2504.20068},
56
+ }
57
+ ```
metadata.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "type": "quantile_regression_forest",
3
+ "task": "llm_output_length_upper_bound_prediction",
4
+ "training_trace": "lmsys-chat",
5
+ "models": ["llama-3.1-8b", "qwen2.5-7b"],
6
+ "quantile": 0.95,
7
+ "framework": "scikit-learn",
8
+ "serialization": "joblib/pickle",
9
+ "jitserve_version": "nsdi26",
10
+ "notes": "QRF predictor and vectorizer must be loaded together"
11
+ }