RWKV World v3 Corpus
RWKV World v3.0 Dataset for training RWKV-7 Goose World v3 models
Viewer • Updated • 61.6M • 73.5k • 988Note v2 Oriignally: https://huggingface.co/datasets/olm/wikipedia
cerebras/SlimPajama-627B
Preview • Updated • 42.7k • 509Note v2 We removed the CC and C4 components of SlimPajama from the corpus for World v3
allenai/peS2o
Updated • 2.85k • 184Note v2
NortheasternUniversity/big_patent
Viewer • Updated • 2.68M • 5.03k • 63Note v2
pile-of-law/pile-of-law
Updated • 3.66k • 261Note v2
bigcode/starcoderdata
Viewer • Updated • 207M • 9.52k • 467Note v2 For StarCoder, we now include all datasets, instead of just those datasets with at least 10 stars
oscar-corpus/OSCAR-2301
Updated • 3.53k • 171Note v2
wecover/OPUS_TED2020
Viewer • Updated • 79.3M • 13.5kNote v2
suolyer/pile_opensubtitles
Viewer • Updated • 1.26k • 49 • 1Note v2
-
suolyer/pile_youtubesubtitles
Viewer • Updated • 668 • 76 • 1
RyokoAI/Honeyfeed3600
Preview • Updated • 19 • 4Note v2
botp/RyokoAI_Syosetu711K
Preview • Updated • 199 • 28Note v2 Correct URL?
marianna13/fanfics
Viewer • Updated • 5.25M • 2.01k • 5Note v2
marianna13/gamedev
Viewer • Updated • 40.7k • 31 • 4Note v2
marianna13/IA-books
Viewer • Updated • 1.92M • 213Note v2
marianna13/libgen
Viewer • Updated • 737k • 370 • 3Note v2
marianna13/research_gate
Viewer • Updated • 66.2k • 35 • 1Note v2
marianna13/superuser
Viewer • Updated • 28.5k • 26 • 1Note v2
marianna13/the-eye
Viewer • Updated • 345k • 240Note v2
marianna13/random_dataset
Viewer • Updated • 4.27M • 471 • 1Note v2
marianna13/vault_text
Viewer • Updated • 49.2k • 608Note v2
marianna13/zlib
Viewer • Updated • 943k • 477Note v2
SaylorTwift/the_pile_books3_minus_gutenberg
Viewer • Updated • 193k • 3.84k • 13Note v2
JeanKaddour/minipile
Viewer • Updated • 1.01M • 3.5k • 132Note v2
Helsinki-NLP/tatoeba_mt
Updated • 27.3k • 61Note v2
shahules786/PoetryFoundationData
Viewer • Updated • 13.9k • 218 • 2Note v2
hoskinson-center/proof-pile
Viewer • Updated • 363k • 4.97k • 63Note v2
P1ayer-1/reddit-math
Viewer • Updated • 78.2k • 89 • 2Note v2
allenai/soda
Viewer • Updated • 1.49M • 711 • 148Note v2
amishshah/song_lyrics
Viewer • Updated • 2.94M • 198 • 18Note v2
roneneldan/TinyStories
Viewer • Updated • 2.14M • 57k • 791Note v2
0x22almostEvil/multilingual-wikihow-qa-16k
Viewer • Updated • 16.8k • 958 • 9Note v2
tatsu-lab/alpaca
Viewer • Updated • 52k • 50.1k • 835Note v2
camel-ai/math
Viewer • Updated • 50k • 368 • 113Note v2
camel-ai/code
Preview • Updated • 227 • 38Note v2
camel-ai/physics
Viewer • Updated • 20k • 906 • 96Note v2
camel-ai/chemistry
Viewer • Updated • 20k • 731 • 62Note v2
camel-ai/ai_society
Preview • Updated • 111 • 44Note v2
camel-ai/biology
Viewer • Updated • 20k • 445 • 55Note v2
databricks/databricks-dolly-15k
Viewer • Updated • 15k • 20.8k • 887Note v2
WizardLMTeam/WizardLM_evol_instruct_70k
Viewer • Updated • 70k • 688 • 195Note v2
nomic-ai/gpt4all_prompt_generations
Viewer • Updated • 438k • 236 • 130Note v2
allenai/dolma
Updated • 1.99k • 964Note v2.1
glaiveai/glaive-code-assistant-v3
Viewer • Updated • 950k • 350 • 57Note v2.1
m-a-p/Code-Feedback
Viewer • Updated • 66.4k • 4.23k • 213Note v2.1
HuggingFaceTB/cosmopedia
Viewer • Updated • 31.1M • 46k • 647Note v2.1
QuixiAI/SystemChat-2.0
Viewer • Updated • 141k • 391 • 74Note v2.1
migtissera/Tess-v1.5
Viewer • Updated • 126k • 75 • 58Note v2.1
openbmb/UltraInteract_sft
Viewer • Updated • 289k • 596 • 124Note v2.1
Magpie-Align/Llama-3-Magpie-Pro-1M-v0.1
Viewer • Updated • 1M • 343 • 19Note v2.1
Magpie-Align/Magpie-Air-MT-300K-v0.1
Viewer • Updated • 300k • 175 • 1Note v2.1
Magpie-Align/Magpie-Qwen2-Pro-1M-v0.1
Viewer • Updated • 1M • 258 • 14Note v2.1
Magpie-Align/Magpie-Phi3-Pro-300K-Filtered
Viewer • Updated • 300k • 81 • 6Note v2.1
Magpie-Align/Magpie-Gemma2-Pro-200K-Filtered
Viewer • Updated • 200k • 85 • 15Note v2.1
mlfoundations/dclm-baseline-1.0
Preview • Updated • 297k • 248Note v3 For DCLM-baseline, we include only global-shard_10_of_10
stanford-oval/ccnews
Viewer • Updated • 893M • 3.9k • 31Note v3
HuggingFaceFW/fineweb-edu
Viewer • Updated • 3.5B • 238k • 828Note v3
math-ai/TemplateGSM
Viewer • Updated • 14.5M • 10.9k • 20Note v3
EleutherAI/proof-pile-2
Updated • 9.21k • 209Note v3
HuggingFaceTB/smollm-corpus
Viewer • Updated • 237M • 14.9k • 399Note v3
TIGER-Lab/WebInstructSub
Viewer • Updated • 2.34M • 1.42k • 159Note v3
H-D-T/Buzz-V1.2
Viewer • Updated • 3.14M • 229 • 12Note v3
TIGER-Lab/SKGInstruct
Preview • Updated • 181 • 28Note v3
Muennighoff/flan
Viewer • Updated • 3.53M • 1.46k • 51Note v3