Update README.md
Browse files
README.md
CHANGED
|
@@ -6,6 +6,6 @@ language:
|
|
| 6 |
- en
|
| 7 |
---
|
| 8 |
|
| 9 |
-
Model of the paper [MoM: Linear Sequence Modeling with Mixture-of-Memories](https://arxiv.org/abs/2502.13685) and [Gated
|
| 10 |
|
| 11 |
The model was trained on a sample of SlimPajama with 15B tokens.
|
|
|
|
| 6 |
- en
|
| 7 |
---
|
| 8 |
|
| 9 |
+
Model of the paper [MoM: Linear Sequence Modeling with Mixture-of-Memories](https://arxiv.org/abs/2502.13685) and [Gated Linear Attention Transformers with Hardware-Efficient Training](https://arxiv.org/abs/2312.06635).
|
| 10 |
|
| 11 |
The model was trained on a sample of SlimPajama with 15B tokens.
|