BERT from Scratch (1 Epoch, Training Loss: 4.13)

BERT model trained from scratch using a custom tokenizer with a 64,000-token vocabulary.

  • Training: 1 epoch
  • Masked Language Modeling (MLM) loss: 4.13
  • Tokenizer: Custom-trained, vocab size, on iit-madras Hindi-monolingual dataset = 64,000
  • Architecture: Maximum position embeddings: 512 Hidden size: 312 Number of attention heads: 12 Number of transformer layers: 4 Intermediate (feed-forward) size: 1200 Type vocabulary size: 2 (for segment embeddings)

It is uploaded for checkpointing, experimentation, and community feedback.

Intended Use

  • Research on training dynamics
  • Continued pretraining
  • Fine-tuning for downstream tasks (with caution)

Limitations

  • Low training coverage (1 epoch)
  • Not yet evaluated on downstream tasks
Downloads last month
4
Safetensors
Model size
24.9M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ishathombre/monolingual-hindi-from-scratch

Finetuned
(6283)
this model

Dataset used to train ishathombre/monolingual-hindi-from-scratch