--- base_model: unsloth/meta-llama-3.1-8b-bnb-4bit library_name: peft pipeline_tag: text-generation tags: - base_model:adapter:unsloth/meta-llama-3.1-8b-bnb-4bit - lora - sft - transformers - trl - unsloth license: apache-2.0 datasets: - sixfingerdev/turkish-qa-multi-dialog-dataset language: - tr - en - zh --- # SixFinger-8B Adapter for LLaMA 3.1 8B This repository contains a **LoRA adapter** for the SixFinger-8B model. The adapter allows fine-tuned responses on top of the base model **```unsloth/llama-3.1-8b-bnb-4bit```** without modifying the base weights. --- ## Overview - **Base Model:** unsloth/llama-3.1-8b-bnb-4bit - **Adapter Type:** LoRA - **Quantization:** 4-bit (via bitsandbytes) - **Purpose:** Enhanced response generation for Turkish/English mixed datasets. - **Compatibility:** Use with Hugging Face Transformers + PEFT library. --- ## Installation Install required dependencies: ```!pip install transformers accelerate bitsandbytes peft``` Ensure you have a GPU with sufficient VRAM for 4-bit inference. --- ## Loading the Model 1. **Load the Base Model** ```from transformers import AutoTokenizer, AutoModelForCausalLM``` ```base_model = AutoModelForCausalLM.from_pretrained( "unsloth/llama-3.1-8b-bnb-4bit", device_map="auto" ) ``` 2. **Load the Adapter** 'from peft import PeftModel' 'model = PeftModel.from_pretrained(' ' base_model,' ' "sixfingerdev/SixFinger-8B"' ')' 3. **Load the Tokenizer** 'tokenizer = AutoTokenizer.from_pretrained("unsloth/llama-3.1-8b-bnb-4bit")' --- ## Example Usage Generate text using the adapter: ```python from transformers import AutoTokenizer, AutoModelForCausalLM from peft import PeftModel import torch # Base model base_model = AutoModelForCausalLM.from_pretrained( "unsloth/llama-3.1-8b-bnb-4bit", device_map="auto" ) # LoRA adapter model = PeftModel.from_pretrained(base_model, "sixfingerdev/SixFinger-8B") # Tokenizer tokenizer = AutoTokenizer.from_pretrained("unsloth/llama-3.1-8b-bnb-4bit") # Örnek text generation prompt = "Soru: Yapay zeka nedir?\nCevap:" inputs = tokenizer(prompt, return_tensors="pt") with torch.no_grad(): outputs = model.generate(**inputs, max_new_tokens=50, do_sample=True, temperature=0.7) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` --- ## Notes - The adapter does **not** modify the base model; it only applies LoRA weights on top. - 4-bit quantization significantly reduces VRAM usage. Ensure your GPU supports **bitsandbytes 4-bit operations**. - You can merge the adapter into the base model for easier deployment if needed. --- ## References - [PEFT (Parameter-Efficient Fine-Tuning)](https://huggingface.co/docs/peft/index) - [Transformers 4-bit Quantization](https://huggingface.co/docs/transformers/main/en/main_classes/quantization) --- ## License The adapter and its usage are provided under the terms specified in the repository. Ensure compliance with the **base model license** (Meta’s LLaMA).