---
license: llama3
language:
- en
metrics:
- accuracy
- bertscore
- bleu
- bleurt
pipeline_tag: text-generation
datasets:
- alphaoumardev/it-support-level-1-qa
base_model:
- meta-llama/Llama-3.1-8B-Instruct
tags:
- llama
- meta
- instruction-tuned
- causal-lm
- transformers
- huggingface
- llama3.1
---

# Model Card for meta-llama/Llama-3.1-8B (Instruction-Tuned)

This model is a powerful, multilingual instruction-tuned autoregressive LLM developed by Meta that excels at chat, reasoning, coding, and long-context tasks.

## Model Details

### Model Description

Llama 3.1 8B is part of Meta's Llama 3.1 collection—released July 23, 2024—including 8B, 70B, and 405B parameter models. It was pre-trained on ~15 trillion tokens of multilingual text and code, with a context window of 128K tokens. Instruction-tuning used supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) to optimize for assistive tasks :contentReference[oaicite:1]{index=1}.

- **Developed by:** Meta AI  
- **Model type:** Decoder‑only transformer (auto-regressive)  
- **Input/Output modality:** Multilingual text and code  
- **Supported languages:** English, German, French, Italian, Portuguese, Hindi, Spanish, Thai (+ broad multilingual support) :contentReference[oaicite:2]{index=2}  
- **Context window:** 128,000 tokens :contentReference[oaicite:3]{index=3}  
- **Knowledge cutoff:** December 2023 :contentReference[oaicite:4]{index=4}  
- **License:** Llama 3.1 Community License (custom commercial) :contentReference[oaicite:5]{index=5}  
- **Finetuned from:** Base pretrained Llama 3.1 8B

### Model Sources

- **Repository:** `https://huggingface.co/meta-llama/Llama-3.1-8B` :contentReference[oaicite:6]{index=6}  
- **Paper:** “Introducing Llama 3” blog post by Meta AI, April 18, 2024; updated to version 3.1 July 23, 2024 :contentReference[oaicite:7]{index=7}  
- **Demo:** Available via transformers pipeline, or hosted on Meta.ai and WhatsApp :contentReference[oaicite:8]{index=8}

## Uses

### Direct Use

Ideal for multilingual chatbots, reasoning assistants, code generation, summarization, data synthesis, and long-context tasks (document analysis, RAG).

### Downstream Use

Can be fine-tuned for domain-specific applications such as RAG, summarization, topic-controlled dialogue, coding agents, multimodal reasoning pipelines.

### Out-of-Scope Use

Not designed for vision (image, audio, video generation). Avoid using for disallowed content per license (e.g., illicit or unsafe instructions).

## Bias, Risks, and Limitations

- May produce biased or unsafe content, hallucinatory outputs, and reflection of training data biases.  
- Context window misuse could cause unexpected behavior.  
- Not fully safe for sensitive/legal/medical advice without guardrails.

### Recommendations

Use with moderation filters, human oversight, prompt safety checks, and evaluation for target domain bias and safety.

## How to Get Started

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "meta-llama/Llama-3.1-8B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

inputs = tokenizer("Can you help me configure my account:", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
````

## Training Details

### Training Data

Pre-trained on a cleaned corpus of \~15 trillion public tokens (multilingual text/code). Instruction tuning used public datasets and \~25M synthetic examples from SFT/RLHF ([Collabnix][1], [Lifewire][2], [Hugging Face][3]).

### Training Procedure

* **Preprocessing:** Public web, code, and instruction data filtered via Meta classifiers.
* **Hyperparameters:** Referenced in local repo; mix of SFT & RLHF; context length up to 128K.

#### Speeds, Sizes, Times

* Pretraining: 15 trillion tokens; \~1.46 M GPU hours for 8B model ([Collabnix][1]).
* Checkpoint size: \~8 B parameters; \~30–40 GB depending on format (fp16, bfloat16).

## Evaluation

### Testing Data & Metrics

Benchmarked on multilingual tasks (MMLU, coding, reasoning), outperforming many open and closed models ([Hugging Face][3]).

* Instruction-tuned 8B: \~69.4% MMLU; latency \~280 ms TTFT; \~193 tokens/sec ([Hugging Face][3]).

### Results Summary

| Metric                | Value              |
| --------------------- | ------------------ |
| MMLU (instruction)    | \~69.4%            |
| Perplexity (The Pile) | \~8.28 (fp16)      |
| Throughput            | \~192.9 tokens/sec |
| Time-to-first-token   | \~0.28 sec         |

## Environmental Impact

* **Pretraining compute:** \~1.46M GPU hours (H100s) for 8B; \~15T tokens.
* **Estimated CO₂e emissions:** Use ML CO₂ Impact calculator for specifics.

## Technical Specifications

### Architecture

* Decoder-only Transformer with SwiGLU, rotary embeddings, RMSNorm, Grouped-Query Attention (GQA); 32 layers, 8B parameters ([arXiv][4], [Prompthub][5], [Collabnix][1], [Wikipedia][6]).

### Compute Infrastructure

* Pretrained on large Meta GPU clusters, likely H100-based.

### Software

* Implemented in PyTorch and Hugging Face Transformers (v4.43+) ([Hugging Face][3]).

## Citation

```bibtex
@misc{together2024llama3,
  title={Introducing Llama 3},
  author={Meta AI},
  howpublished={\url{https://ai.meta.com/blog/meta-llama-3/}},
  year={2024},
  note={Version 3.1 released July 23, 2024}
}
```

[1]: https://collabnix.com/llama-3-1-405b-70b-8b-with-multilinguality-and-long-context/?utm_source=chatgpt.com "Llama 3.1 - 405B, 70B & 8B with Multilinguality and Long Context"
[2]: https://www.lifewire.com/llama-2-vs-llama-3-8714445?utm_source=chatgpt.com "Llama 3 vs. Llama 2: Why the Newest Model Leaves Its Predecessor in the Dust"
[3]: https://huggingface.co/meta-llama/Llama-3.1-8B?utm_source=chatgpt.com "meta-llama/Llama-3.1-8B - Hugging Face"
[4]: https://arxiv.org/abs/2404.18988?utm_source=chatgpt.com "Markovian Transformers for Informative Language Modeling"
[5]: https://www.prompthub.us/models/llama-3-1-8b?utm_source=chatgpt.com "Llama 3.1 8B Model Card - PromptHub"
[6]: https://en.wikipedia.org/wiki/Llama_%28language_model%29?utm_source=chatgpt.com "Llama (language model)"