๐ง Shivik-M5
A compact, efficient decoder-only language model designed for research, experimentation, and custom LLM development.
โจ Overview
Shivik-M5 is a 1.7B-parameter transformer model implemented with:
- ๐งฑ Decoder-only architecture
- โ๏ธ Multi-Head Attention
- ๐ Rotary Positional Embeddings (RoPE)
- ๐ KV-Caching for fast generation
- ๐งฎ SwiGLU Feedforward Layers
- ๐ RMSNorm
- ๐ฏ Tied embeddings
This model is particularly suited for:
- Building and testing custom LLM architectures
- Lightweight reasoning tasks
- Educational use, research, and experimentation
- Serving as a base for instruction-tuning, RLHF, or domain-specific finetuning
๐ Key Features
โ๏ธ Custom architecture
Fully implemented in modeling_shivik_m4.py โ no external dependencies beyond PyTorch + Transformers.
โ๏ธ Efficient attention
32 attention heads
64-dim head size
Optimized KV caching
RoPE for long-context stability
โ๏ธ Simple to extend
You can easily modify:
- number of layers
- head count
- embedding size
- MLP dimensions
Ideal for prototyping your own model families.
๐ฆ Model Details
| Component | Value |
|---|---|
| Parameters | ~1.7B |
| Layers | 24 |
| Hidden Size | 2048 |
| Intermediate Size (MLP) | 8192 |
| Attention Heads | 32 |
| Key/Value Heads | 32 (MHA) |
| Max Sequence Length | 4096 |
| Positional Encoding | RoPE |
| Normalization | RMSNorm |
| Vocabulary Size | 49,152 |
๐ง Usage
Load with AutoModel + trust_remote_code=True
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "ziadrone/shivik-m5"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
trust_remote_code=True,
torch_dtype="auto",
device_map="auto"
)
prompt = "Hello, how are you?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output = model.generate(
**inputs,
max_new_tokens=50,
temperature=0.7,
)
print(tokenizer.decode(output[0]))
---
language:
- en
license: mit
tags:
- decoder-only
- research
- transformer
- custom-architecture
- shivik-llm
datasets:
- none
model-index:
- name: Shivik-M5
results: []
---
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support