Alpaca Style Datasets Collection Datasets which follow the Alpaca Style format based on having 'instruction', 'input', and 'output' columns β’ 5367 items β’ Updated Apr 5 β’ 4
Probably function calling datasets Collection Created using the https://huggingface.co/spaces/librarian-bots/dataset-column-search-api Space. β’ 39 items β’ Updated Jul 17, 2024 β’ 39
πͺ SmolLM Collection A series of smol LLMs: 135M, 360M and 1.7B. We release base and Instruct models as well as the training corpus and some WebGPU demos β’ 12 items β’ Updated May 5 β’ 240
view article Article Wikipedia's Treasure Trove: Advancing Machine Learning with Diverse Data Jun 3, 2024 β’ 13
ZeroGPU Spaces Collection ZeroGPU Spaces made by the community β’ 17 items β’ Updated Jun 6, 2024 β’ 246
On the Societal Impact of Open Foundation Models Paper β’ 2403.07918 β’ Published Feb 27, 2024 β’ 17
haiku Collection πΈ This is a collection of synthetic datasets built to help improve the ability of open language models to better write haikus through the use of DPO β’ 3 items β’ Updated Jun 21, 2024 β’ 6
Image dataset Collection 10 datasets showcase how to configure and load image datasets β’ 10 items β’ Updated Aug 2, 2024 β’ 8