Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
batmanLovesAI
/
HeliumLM
like
0
Text Generation
PyTorch
roneneldan/TinyStories
English
slm
transformer
attention
optimization
tinystories
educational
arxiv:
2305.07759
arxiv:
2505.19529
License:
mit
Model card
Files
Files and versions
xet
Community
main
HeliumLM
/
checkpoints
747 MB
1 contributor
History:
42 commits
batmanLovesAI
Add: Vanilla model trained on the entire tinystories dataset. Delete: Previous vanilla models that were trained on 25%, 50% and 75% of the dataset.
17304b3
2 days ago
helium-distill-1-08-model-iter-14000.pt
pickle
Detected Pickle imports (4)
"torch.ComplexFloatStorage"
,
"torch.FloatStorage"
,
"collections.OrderedDict"
,
"torch._utils._rebuild_tensor_v2"
What is a pickle import?
106 MB
xet
Removed uneccessary models and renamed models for better understanding
19 days ago
helium-distill-1-08-model-iter-8000.pt
pickle
Detected Pickle imports (4)
"collections.OrderedDict"
,
"torch.FloatStorage"
,
"torch.ComplexFloatStorage"
,
"torch._utils._rebuild_tensor_v2"
What is a pickle import?
106 MB
xet
Removed uneccessary models and renamed models for better understanding
19 days ago
helium-distill-5-05-model-iter-8000.pt
pickle
Detected Pickle imports (4)
"torch._utils._rebuild_tensor_v2"
,
"torch.FloatStorage"
,
"collections.OrderedDict"
,
"torch.ComplexFloatStorage"
What is a pickle import?
106 MB
xet
Removed uneccessary models and renamed models for better understanding
19 days ago
heliumLM-distilled-final-phase-1.pt
pickle
Detected Pickle imports (4)
"torch.FloatStorage"
,
"torch.ComplexFloatStorage"
,
"torch._utils._rebuild_tensor_v2"
,
"collections.OrderedDict"
What is a pickle import?
106 MB
xet
Added first model of the final phase
15 days ago
heliumlm-grammar-model.pt
pickle
Detected Pickle imports (4)
"torch.FloatStorage"
,
"torch.ComplexFloatStorage"
,
"torch._utils._rebuild_tensor_v2"
,
"collections.OrderedDict"
What is a pickle import?
106 MB
xet
Deleted irrelevant models and added grammatically correct model trained in phases on entire tinystories dataset (using quartely batch technique)
16 days ago
heliumlm-vanilla-swiglu.pt
215 MB
xet
Add: Vanilla model trained on the entire tinystories dataset. Delete: Previous vanilla models that were trained on 25%, 50% and 75% of the dataset.
2 days ago