Sarashina2-7B BitsAndBytes 4-bit Quantized

This is a 4-bit quantized version of sbintuitions/sarashina2-7b using BitsAndBytes (NF4).

Model Description

Base Model: sarashina2-7b (7B parameters)
Quantization Method: BitsAndBytes (bitsandbytes library)
Quantization Type: NF4 (Normal Float 4-bit)
Double Quantization: Enabled
Compute dtype: bfloat16
Original Size: ~14.6 GB
Quantized Size: ~4-5 GB
Memory Reduction: ~70-75%

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

model_id = "{hf_model_id}"

# Configure BitsAndBytes
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16
)

# Load model
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True
)

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)

# Generate text
prompt = "おはようございます、今日の天気は"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=50,
    do_sample=True,
    temperature=0.7,
    top_p=0.95
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Installation

pip install transformers accelerate bitsandbytes

Requirements

CUDA GPU: BitsAndBytes requires CUDA (not compatible with CPU)
GPU Memory: ~5-6 GB VRAM recommended
Python: 3.8+

Performance

Memory Usage: Reduced by ~70-75% compared to FP16
Inference Speed: Comparable to FP16 on modern GPUs
Quality: Minimal accuracy loss with NF4 quantization

Advantages of BitsAndBytes

✅ No calibration required - quantizes on model load
✅ Easy to use - single configuration parameter
✅ Widely compatible - works with most Hugging Face models
✅ Double quantization - additional memory savings
✅ NF4 quantization - optimized for neural network weights

Limitations

Requires CUDA GPU (no CPU support)
May have slight quality degradation compared to full precision
Cannot export to ONNX or other formats

License

MIT License (inherited from base model)

Citation

@misc{{sarashina2-7b-bnb,
  author = {{Ronan Takizawa}},
  title = {{Sarashina2-7B BitsAndBytes 4-bit Quantized}},
  year = {{2025}},
  publisher = {{Hugging Face}},
  howpublished = {{\\url{{https://huggingface.co/{hf_model_id}}}}}
}}

Base Model Citation

Please refer to the original model card for the base model citation.

Downloads last month: -

Safetensors

Model size

7B params

Tensor type

F16

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ronantakizawa/sarashina2-7b-4bit-bnb

Base model

sbintuitions/sarashina2-7b

Quantized

(6)

this model