attention-is-not-all-you-need
Collection
3 items
β’
Updated
This repository contains the best model checkpoints from the reproduction of "Attention Is Not What You Need" paper.
βββ grassmann_snli/
β βββ checkpoints/best.pt
β βββ snli_test_results.json
β βββ snli_validation_results.json
βββ grassmann_wikitext_L128_N6/
β βββ checkpoints/best.pt
β βββ results.json
β βββ wikitext_validation_results.json
βββ grassmann_wikitext_L128_N12/
β βββ checkpoints/best.pt
β βββ results.json
β βββ wikitext_validation_results.json
βββ grassmann_wikitext_L256_N6/
β βββ checkpoints/best.pt
β βββ results.json
β βββ wikitext_validation_results.json
βββ grassmann_wikitext_L256_N12/
β βββ checkpoints/best.pt
β βββ results.json
β βββ wikitext_validation_results.json
βββ transformer_snli/
β βββ checkpoints/best.pt
β βββ snli_test_results.json
β βββ snli_validation_results.json
βββ transformer_wikitext_L128_N6/
β βββ checkpoints/best.pt
β βββ results.json
β βββ wikitext_validation_results.json
βββ transformer_wikitext_L128_N12/
β βββ checkpoints/best.pt
β βββ results.json
β βββ wikitext_validation_results.json
βββ transformer_wikitext_L256_N6/
β βββ checkpoints/best.pt
β βββ results.json
β βββ wikitext_validation_results.json
βββ transformer_wikitext_L256_N12/
βββ checkpoints/best.pt
βββ results.json
βββ wikitext_validation_results.json
import torch
# Load a checkpoint
checkpoint = torch.load("grassmann_wikitext_L256_N12/checkpoints/best.pt")
# Access model state
model_state = checkpoint['model_state_dict']
epoch = checkpoint['epoch']
val_loss = checkpoint['val_loss']
print(f"Epoch: {epoch}, Val Loss: {val_loss}")
If you use these models, please cite the original paper reproduction:
@misc{attn-is-not-all-you-need-reproduction,
title={Reproduction of "Attention Is Not What You Need"},
author={alphaXiv},
year={2026},
url={https://github.com/alphaXiv/paper-implementations}
}
All models trained on:
MIT License