๐Ÿง  Shivik-M5

A compact, efficient decoder-only language model designed for research, experimentation, and custom LLM development.


โœจ Overview

Shivik-M5 is a 1.7B-parameter transformer model implemented with:

  • ๐Ÿงฑ Decoder-only architecture
  • โš™๏ธ Multi-Head Attention
  • ๐Ÿ” Rotary Positional Embeddings (RoPE)
  • ๐Ÿ”„ KV-Caching for fast generation
  • ๐Ÿงฎ SwiGLU Feedforward Layers
  • ๐Ÿ“ RMSNorm
  • ๐ŸŽฏ Tied embeddings

This model is particularly suited for:

  • Building and testing custom LLM architectures
  • Lightweight reasoning tasks
  • Educational use, research, and experimentation
  • Serving as a base for instruction-tuning, RLHF, or domain-specific finetuning

๐Ÿš€ Key Features

โœ”๏ธ Custom architecture

Fully implemented in modeling_shivik_m4.py โ€” no external dependencies beyond PyTorch + Transformers.

โœ”๏ธ Efficient attention

32 attention heads
64-dim head size
Optimized KV caching
RoPE for long-context stability

โœ”๏ธ Simple to extend

You can easily modify:

  • number of layers
  • head count
  • embedding size
  • MLP dimensions

Ideal for prototyping your own model families.


๐Ÿ“ฆ Model Details

Component Value
Parameters ~1.7B
Layers 24
Hidden Size 2048
Intermediate Size (MLP) 8192
Attention Heads 32
Key/Value Heads 32 (MHA)
Max Sequence Length 4096
Positional Encoding RoPE
Normalization RMSNorm
Vocabulary Size 49,152

๐Ÿ”ง Usage

Load with AutoModel + trust_remote_code=True

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "ziadrone/shivik-m5"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype="auto",
    device_map="auto"
)

prompt = "Hello, how are you?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

output = model.generate(
    **inputs,
    max_new_tokens=50,
    temperature=0.7,
)

print(tokenizer.decode(output[0]))

---
language: 
- en
license: mit
tags:
- decoder-only
- research
- transformer
- custom-architecture
- shivik-llm
datasets:
- none
model-index:
- name: Shivik-M5
  results: []
---
Downloads last month
-
Safetensors
Model size
2B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Evaluation results