Catastrophic Forgetting by Language models

Fourwheels2512 · February 27, 2026, 6:08pm

To all the awesome experts in AI/ML out there. i need a favor.
I realized there is a gap in Language Models (SLMs/LLMs) remembering the data continuously which is termed as ‘catastrophic forgetting’.

To solve that problem I came up with an adapter called Constrained Residual Mixing Adapter (CRMA) that enables continual learning. I tested it on TinyLlama 1.1B and Mistral 7B — the result: -0.1% drift across 4 sequential
domains. Essentially zero forgetting.

CRMA: -0.1% drift. Naive: +351% forgetting. Same model, same data, same hardware.

Holds at both 1.1B and 7B. No replay, no EWC, no KD needed.
● CRMA Modular vs Naive — Mistral 7B (4 sequential domains)

┌─────────┬────────────┬──────────────────┐
│ Task │ CRMA Drift │ Naive Forgetting │
├─────────┼────────────┼──────────────────┤
│ Medical │ -0.2% │ +228% │
├─────────┼────────────┼──────────────────┤
│ Legal │ -0.1% │ +593% │
├─────────┼────────────┼──────────────────┤
│ Code │ -0.1% │ +233% │
├─────────┼────────────┼──────────────────┤
│ Finance │ +0.0% │ — │
├─────────┼────────────┼──────────────────┤
│ Average │ -0.1% │ +351% │
└─────────┴────────────┴──────────────────┘

Now the favor - If you’re interested in independently verifying these results, I’d love to hear from you. DM me and I’ll share what you need to reproduce it. Thank you. and best wishes

Topic		Replies	Views
Catastrophic Forgetting of language models Research	2	8	February 28, 2026
CRMA: Drop-in adapter for fine-tuning + continual learning — zero catastrophic forgetting at 7B scale Research	0	11	February 27, 2026
CRMA: Stable Fine-Tuning + Continual Learning for Small LLMs Research	0	15	February 27, 2026
CRMA Continual learning Spaces	1	19	February 25, 2026
CRMA: A low-rank stability adapter for QLoRA fine-tuning — ablation results on TinyLlama, Gemma, Mistral Research	0	23	February 22, 2026

Catastrophic Forgetting by Language models

Related topics