En-2863
/

jitserve-qrf-length-predictor

quantile-regression

Model card Files Files and versions

En-2863 commited on 17 days ago

Commit

15ccb52

·

verified ·

1 Parent(s): 17603d2

Upload folder using huggingface_hub

Files changed (2) hide show

README.md +57 -3
metadata.json +11 -0

README.md CHANGED Viewed

@@ -1,3 +1,57 @@
----
-license: apache-2.0
----

+# JITServe QRF Length Predictor
+This repository provides the **pretrained QRF (Quantile Regression Forest) length predictor**
+used by **[JITServe (NSDI’26)](https://arxiv.org/abs/2504.20068)** to estimate conservative upper bounds on LLM output lengths.
+This predictor is:
+- **Not an LLM evaluation model**
+- **Not fine-tuned during inference**
+- A lightweight **offline-trained prediction model** used solely for scheduling decisions
+It is released to ensure **full reproducibility** of the JITServe artifact.
+---
+## What Is Included
+This repository contains two components that must be used together:
+```text
+qrf_model/
+  ├── 0_qrf_lmsys_chat_llama3_8b.pkl
+  └── 0_qrf_lmsys_chat_qwen25_7b.pkl
+qrf_vectorizer/
+  ├── 0_qrf_lmsys_chat_llama3_8b.pkl
+  └── 0_qrf_lmsys_chat_qwen25_7b.pkl
+```
+## Usage
+These artifacts are consumed by JITServe at runtime.
+Expected directory layout in the JITServe artifact:
+```
+assets/qrf/
+├── qrf_model/
+└── qrf_vectorizer/
+```
+After downloading this repository, place its contents under the path above.
+JITServe loads the predictor automatically during startup and does not require
+any additional configuration by default.
+## Citation
+If you use these artifacts, please consider to cite our paper:
+```
+@misc{zhang2025jitservesloawarellmserving,
+      title={JITServe: SLO-aware LLM Serving with Imprecise Request Information},
+      author={Wei Zhang and Zhiyu Wu and Yi Mu and Rui Ning and Banruo Liu and Nikhil Sarda and Myungjin Lee and Fan Lai},
+      year={2025},
+      eprint={2504.20068},
+      archivePrefix={arXiv},
+      primaryClass={cs.DC},
+      url={https://arxiv.org/abs/2504.20068},
+}
+```

metadata.json ADDED Viewed

	@@ -0,0 +1,11 @@

+{
+  "type": "quantile_regression_forest",
+  "task": "llm_output_length_upper_bound_prediction",
+  "training_trace": "lmsys-chat",
+  "models": ["llama-3.1-8b", "qwen2.5-7b"],
+  "quantile": 0.95,
+  "framework": "scikit-learn",
+  "serialization": "joblib/pickle",
+  "jitserve_version": "nsdi26",
+  "notes": "QRF predictor and vectorizer must be loaded together"
+}