kaitchup
/

Olmo-3-7B-Think-w8a8-smoothquant

8-bit precision

compressed-tensors

Model card Files Files and versions

bnjmnmarie commited on 19 days ago

Commit

0945f00

·

verified ·

1 Parent(s): 71e0eee

Update README.md

Files changed (1) hide show

README.md +5 -0

README.md CHANGED Viewed

@@ -7,6 +7,11 @@ tags:
 ---
 This is [allenai/Olmo-3-7B-Think](https://huggingface.co/allenai/Olmo-3-7B-Think) quantized with [LLM Compressor](https://github.com/vllm-project/llm-compressor) with Smoothquant (W8A8). The model is compatible with vLLM (tested: v0.11.2). Tested with an RTX 4090.
 - **Developed by:** [The Kaitchup](https://kaitchup.substack.com/)
 - **License:** Apache 2.0 license

 ---
 This is [allenai/Olmo-3-7B-Think](https://huggingface.co/allenai/Olmo-3-7B-Think) quantized with [LLM Compressor](https://github.com/vllm-project/llm-compressor) with Smoothquant (W8A8). The model is compatible with vLLM (tested: v0.11.2). Tested with an RTX 4090.
+How the models perform (token efficiency, accuracy per domain, ...) and how to use them:
+[Quantizing Olmo 3: Most Efficient and Accurate Formats](https://kaitchup.substack.com/p/quantizing-olmo-3-most-efficient)
+![image](https://cdn-uploads.huggingface.co/production/uploads/64b93e6bd6c468ac7536607e/H3JWV_ha07IrN-Sz6C7VL.png)
 - **Developed by:** [The Kaitchup](https://kaitchup.substack.com/)
 - **License:** Apache 2.0 license