linear-moe-hub
/

GLA-340M

Model card Files Files and versions

JusenK commited on Mar 18, 2025

Commit

16a919e

·

verified ·

1 Parent(s): a21caa0

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -6,6 +6,6 @@ language:
 - en
 ---
-Model of the paper [MoM: Linear Sequence Modeling with Mixture-of-Memories](https://arxiv.org/abs/2502.13685) and [Gated Slot Attention for Efficient Linear-Time Sequence Modeling](https://arxiv.org/abs/2409.07146).
 The model was trained on a sample of SlimPajama with 15B tokens.

 - en
 ---
+Model of the paper [MoM: Linear Sequence Modeling with Mixture-of-Memories](https://arxiv.org/abs/2502.13685) and [Gated Linear Attention Transformers with Hardware-Efficient Training](https://arxiv.org/abs/2312.06635).
 The model was trained on a sample of SlimPajama with 15B tokens.