--- license: llama3 language: - en metrics: - accuracy - bertscore - bleu - bleurt pipeline_tag: text-generation datasets: - alphaoumardev/it-support-level-1-qa base_model: - meta-llama/Llama-3.1-8B-Instruct tags: - llama - meta - instruction-tuned - causal-lm - transformers - huggingface - llama3.1 --- # Model Card for meta-llama/Llama-3.1-8B (Instruction-Tuned) This model is a powerful, multilingual instruction-tuned autoregressive LLM developed by Meta that excels at chat, reasoning, coding, and long-context tasks. ## Model Details ### Model Description Llama 3.1 8B is part of Meta's Llama 3.1 collection—released July 23, 2024—including 8B, 70B, and 405B parameter models. It was pre-trained on ~15 trillion tokens of multilingual text and code, with a context window of 128K tokens. Instruction-tuning used supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) to optimize for assistive tasks :contentReference[oaicite:1]{index=1}. - **Developed by:** Meta AI - **Model type:** Decoder‑only transformer (auto-regressive) - **Input/Output modality:** Multilingual text and code - **Supported languages:** English, German, French, Italian, Portuguese, Hindi, Spanish, Thai (+ broad multilingual support) :contentReference[oaicite:2]{index=2} - **Context window:** 128,000 tokens :contentReference[oaicite:3]{index=3} - **Knowledge cutoff:** December 2023 :contentReference[oaicite:4]{index=4} - **License:** Llama 3.1 Community License (custom commercial) :contentReference[oaicite:5]{index=5} - **Finetuned from:** Base pretrained Llama 3.1 8B ### Model Sources - **Repository:** `https://huggingface.co/meta-llama/Llama-3.1-8B` :contentReference[oaicite:6]{index=6} - **Paper:** “Introducing Llama 3” blog post by Meta AI, April 18, 2024; updated to version 3.1 July 23, 2024 :contentReference[oaicite:7]{index=7} - **Demo:** Available via transformers pipeline, or hosted on Meta.ai and WhatsApp :contentReference[oaicite:8]{index=8} ## Uses ### Direct Use Ideal for multilingual chatbots, reasoning assistants, code generation, summarization, data synthesis, and long-context tasks (document analysis, RAG). ### Downstream Use Can be fine-tuned for domain-specific applications such as RAG, summarization, topic-controlled dialogue, coding agents, multimodal reasoning pipelines. ### Out-of-Scope Use Not designed for vision (image, audio, video generation). Avoid using for disallowed content per license (e.g., illicit or unsafe instructions). ## Bias, Risks, and Limitations - May produce biased or unsafe content, hallucinatory outputs, and reflection of training data biases. - Context window misuse could cause unexpected behavior. - Not fully safe for sensitive/legal/medical advice without guardrails. ### Recommendations Use with moderation filters, human oversight, prompt safety checks, and evaluation for target domain bias and safety. ## How to Get Started ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_id = "meta-llama/Llama-3.1-8B" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id) inputs = tokenizer("Can you help me configure my account:", return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=200) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ```` ## Training Details ### Training Data Pre-trained on a cleaned corpus of \~15 trillion public tokens (multilingual text/code). Instruction tuning used public datasets and \~25M synthetic examples from SFT/RLHF ([Collabnix][1], [Lifewire][2], [Hugging Face][3]). ### Training Procedure * **Preprocessing:** Public web, code, and instruction data filtered via Meta classifiers. * **Hyperparameters:** Referenced in local repo; mix of SFT & RLHF; context length up to 128K. #### Speeds, Sizes, Times * Pretraining: 15 trillion tokens; \~1.46 M GPU hours for 8B model ([Collabnix][1]). * Checkpoint size: \~8 B parameters; \~30–40 GB depending on format (fp16, bfloat16). ## Evaluation ### Testing Data & Metrics Benchmarked on multilingual tasks (MMLU, coding, reasoning), outperforming many open and closed models ([Hugging Face][3]). * Instruction-tuned 8B: \~69.4% MMLU; latency \~280 ms TTFT; \~193 tokens/sec ([Hugging Face][3]). ### Results Summary | Metric | Value | | --------------------- | ------------------ | | MMLU (instruction) | \~69.4% | | Perplexity (The Pile) | \~8.28 (fp16) | | Throughput | \~192.9 tokens/sec | | Time-to-first-token | \~0.28 sec | ## Environmental Impact * **Pretraining compute:** \~1.46M GPU hours (H100s) for 8B; \~15T tokens. * **Estimated CO₂e emissions:** Use ML CO₂ Impact calculator for specifics. ## Technical Specifications ### Architecture * Decoder-only Transformer with SwiGLU, rotary embeddings, RMSNorm, Grouped-Query Attention (GQA); 32 layers, 8B parameters ([arXiv][4], [Prompthub][5], [Collabnix][1], [Wikipedia][6]). ### Compute Infrastructure * Pretrained on large Meta GPU clusters, likely H100-based. ### Software * Implemented in PyTorch and Hugging Face Transformers (v4.43+) ([Hugging Face][3]). ## Citation ```bibtex @misc{together2024llama3, title={Introducing Llama 3}, author={Meta AI}, howpublished={\url{https://ai.meta.com/blog/meta-llama-3/}}, year={2024}, note={Version 3.1 released July 23, 2024} } ``` [1]: https://collabnix.com/llama-3-1-405b-70b-8b-with-multilinguality-and-long-context/?utm_source=chatgpt.com "Llama 3.1 - 405B, 70B & 8B with Multilinguality and Long Context" [2]: https://www.lifewire.com/llama-2-vs-llama-3-8714445?utm_source=chatgpt.com "Llama 3 vs. Llama 2: Why the Newest Model Leaves Its Predecessor in the Dust" [3]: https://huggingface.co/meta-llama/Llama-3.1-8B?utm_source=chatgpt.com "meta-llama/Llama-3.1-8B - Hugging Face" [4]: https://arxiv.org/abs/2404.18988?utm_source=chatgpt.com "Markovian Transformers for Informative Language Modeling" [5]: https://www.prompthub.us/models/llama-3-1-8b?utm_source=chatgpt.com "Llama 3.1 8B Model Card - PromptHub" [6]: https://en.wikipedia.org/wiki/Llama_%28language_model%29?utm_source=chatgpt.com "Llama (language model)"