🧠 Shivik-M5

A compact, efficient decoder-only language model designed for research, experimentation, and custom LLM development.

✨ Overview

Shivik-M5 is a 1.7B-parameter transformer model implemented with:

🧱 Decoder-only architecture
⚙️ Multi-Head Attention
🔁 Rotary Positional Embeddings (RoPE)
🔄 KV-Caching for fast generation
🧮 SwiGLU Feedforward Layers
📏 RMSNorm
🎯 Tied embeddings

This model is particularly suited for:

Building and testing custom LLM architectures
Lightweight reasoning tasks
Educational use, research, and experimentation
Serving as a base for instruction-tuning, RLHF, or domain-specific finetuning

🚀 Key Features

✔️ Custom architecture

Fully implemented in modeling_shivik_m4.py — no external dependencies beyond PyTorch + Transformers.

✔️ Efficient attention

32 attention heads
64-dim head size
Optimized KV caching
RoPE for long-context stability

✔️ Simple to extend

You can easily modify:

number of layers
head count
embedding size
MLP dimensions

Ideal for prototyping your own model families.

📦 Model Details

Component	Value
Parameters	~1.7B
Layers	24
Hidden Size	2048
Intermediate Size (MLP)	8192
Attention Heads	32
Key/Value Heads	32 (MHA)
Max Sequence Length	4096
Positional Encoding	RoPE
Normalization	RMSNorm
Vocabulary Size	49,152

🔧 Usage

Load with AutoModel + trust_remote_code=True

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "ziadrone/shivik-m5"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype="auto",
    device_map="auto"
)

prompt = "Hello, how are you?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

output = model.generate(
    **inputs,
    max_new_tokens=50,
    temperature=0.7,
)

print(tokenizer.decode(output[0]))

---
language: 
- en
license: mit
tags:
- decoder-only
- research
- transformer
- custom-architecture
- shivik-llm
datasets:
- none
model-index:
- name: Shivik-M5
  results: []
---

Downloads last month: -

Safetensors

Model size

2B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

ziadrone
/

shivik-m5