LLM Course

0. Setup

1. Transformer models

2. Using 🤗 Transformers

3. Fine-tuning a pretrained model

4. Sharing models and tokenizers

5. The 🤗 Datasets library

Introduction What if my dataset isn't on the Hub?Time to slice and dice Big data? 🤗 Datasets to the rescue!Creating your own dataset Semantic search with FAISS 🤗 Datasets, check!End-of-chapter quiz

6. The 🤗 Tokenizers library

7. Classical NLP tasks

8. How to ask for help

9. Building and sharing demos

10. Curate high-quality datasets

11. Fine-tune Large Language Models

12. Build Reasoning Models new

Course Events

Join the Hugging Face community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces

Faster examples with accelerated inference

Switch between documentation themes

to get started

🤗 Datasets, check!

Well, that was quite a tour through the 🤗 Datasets library — congratulations on making it this far! With the knowledge that you’ve gained from this chapter, you should be able to:

Load datasets from anywhere, be it the Hugging Face Hub, your laptop, or a remote server at your company.
Wrangle your data using a mix of the Dataset.map() and Dataset.filter() functions.
Quickly switch between data formats like Pandas and NumPy using Dataset.set_format().
Create your very own dataset and push it to the Hugging Face Hub.
Embed your documents using a Transformer model and build a semantic search engine using FAISS.

In Chapter 7, we’ll put all of this to good use as we take a deep dive into the core NLP tasks that Transformer models are great for. Before jumping ahead, though, put your knowledge of 🤗 Datasets to the test with a quick quiz!

Update on GitHub

←Semantic search with FAISS End-of-chapter quiz→