Catastrophic Forgetting by Language models

To all the awesome experts in AI/ML out there. i need a favor.
I realized there is a gap in Language Models (SLMs/LLMs) remembering the data continuously which is termed as ‘catastrophic forgetting’.

To solve that problem I came up with an adapter called Constrained Residual Mixing Adapter (CRMA) that enables continual learning. I tested it on TinyLlama 1.1B and Mistral 7B — the result: -0.1% drift across 4 sequential
domains. Essentially zero forgetting.

CRMA: -0.1% drift. Naive: +351% forgetting. Same model, same data, same hardware.

Holds at both 1.1B and 7B. No replay, no EWC, no KD needed.
● CRMA Modular vs Naive — Mistral 7B (4 sequential domains)

┌─────────┬────────────┬──────────────────┐
│ Task │ CRMA Drift │ Naive Forgetting │
├─────────┼────────────┼──────────────────┤
│ Medical │ -0.2% │ +228% │
├─────────┼────────────┼──────────────────┤
│ Legal │ -0.1% │ +593% │
├─────────┼────────────┼──────────────────┤
│ Code │ -0.1% │ +233% │
├─────────┼────────────┼──────────────────┤
│ Finance │ +0.0% │ — │
├─────────┼────────────┼──────────────────┤
│ Average │ -0.1% │ +351% │
└─────────┴────────────┴──────────────────┘

Now the favor - If you’re interested in independently verifying these results, I’d love to hear from you. DM me and I’ll share what you need to reproduce it. Thank you. and best wishes

1 Like