---
base_model: unsloth/meta-llama-3.1-8b-bnb-4bit
library_name: peft
pipeline_tag: text-generation
tags:
- base_model:adapter:unsloth/meta-llama-3.1-8b-bnb-4bit
- lora
- sft
- transformers
- trl
- unsloth
license: apache-2.0
datasets:
- sixfingerdev/turkish-qa-multi-dialog-dataset
language:
- tr
- en
- zh
---
# SixFinger-8B Adapter for LLaMA 3.1 8B  

This repository contains a **LoRA adapter** for the SixFinger-8B model.  
The adapter allows fine-tuned responses on top of the base model **```unsloth/llama-3.1-8b-bnb-4bit```** without modifying the base weights.  

---

## Overview

- **Base Model:** unsloth/llama-3.1-8b-bnb-4bit  
- **Adapter Type:** LoRA  
- **Quantization:** 4-bit (via bitsandbytes)  
- **Purpose:** Enhanced response generation for Turkish/English mixed datasets.  
- **Compatibility:** Use with Hugging Face Transformers + PEFT library.  

---

## Installation

Install required dependencies:

```!pip install transformers accelerate bitsandbytes peft```

Ensure you have a GPU with sufficient VRAM for 4-bit inference.  

---

## Loading the Model

1. **Load the Base Model**

```from transformers import AutoTokenizer, AutoModelForCausalLM```

```base_model = AutoModelForCausalLM.from_pretrained(
    "unsloth/llama-3.1-8b-bnb-4bit",
    device_map="auto"
)
```

2. **Load the Adapter**

'from peft import PeftModel'

'model = PeftModel.from_pretrained('
'    base_model,'
'    "sixfingerdev/SixFinger-8B"'
')'

3. **Load the Tokenizer**

'tokenizer = AutoTokenizer.from_pretrained("unsloth/llama-3.1-8b-bnb-4bit")'

---

## Example Usage

Generate text using the adapter:
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

# Base model
base_model = AutoModelForCausalLM.from_pretrained(
    "unsloth/llama-3.1-8b-bnb-4bit",
    device_map="auto"
)

# LoRA adapter
model = PeftModel.from_pretrained(base_model, "sixfingerdev/SixFinger-8B")

# Tokenizer
tokenizer = AutoTokenizer.from_pretrained("unsloth/llama-3.1-8b-bnb-4bit")

# Örnek text generation
prompt = "Soru: Yapay zeka nedir?\nCevap:"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=50, do_sample=True, temperature=0.7)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
---

## Notes

- The adapter does **not** modify the base model; it only applies LoRA weights on top.  
- 4-bit quantization significantly reduces VRAM usage. Ensure your GPU supports **bitsandbytes 4-bit operations**.  
- You can merge the adapter into the base model for easier deployment if needed.  

---

## References

- [PEFT (Parameter-Efficient Fine-Tuning)](https://huggingface.co/docs/peft/index)  
- [Transformers 4-bit Quantization](https://huggingface.co/docs/transformers/main/en/main_classes/quantization)  

---

## License

The adapter and its usage are provided under the terms specified in the repository.  
Ensure compliance with the **base model license** (Meta’s LLaMA).