PALADIM: Pre-Adaptive Learning Architecture with Dual-Process Hebbian-MoE Schema

A 1.04B parameter continual learning model for medical drug recommendation


Ελληνική Περίληψη (Greek Summary)

Τι είναι το PALADIM;

Το PALADIM είναι ένα μοντέλο τεχνητής νοημοσύνης με 1.04 δισεκατομμύρια παραμέτρους για την πρόταση φαρμάκων σε ιατρικά περιστατικά.

Βασικά Χαρακτηριστικά:

  • 🧠 Αρχιτεκτονική Συνεχούς Μάθησης - μαθαίνει νέες πληροφορίες χωρίς να ξεχνά τις παλιές
  • 💊 609 διαφορετικά φάρμακα - προτείνει θεραπείες από μεγάλο φάσμα φαρμάκων
  • 🔬 Mixture of Experts (MoE) - 16 εξειδικευμένοι "εμπειρογνώμονες" ανά επίπεδο
  • 📊 Εκπαιδευμένο σε 1,794 περιστατικά ασθενών

Τρέχουσα Υποστήριξη:

  • ✅ Αγγλικά (English) - πλήρης υποστήριξη
  • ⏳ Ελληνικά (Greek) - προγραμματισμένο για μελλοντική εκπαίδευση

Άδεια Χρήσης: Apache 2.0 (ανοιχτός κώδικας)

⚠️ Σημαντική Σημείωση: Αυτό το μοντέλο είναι μόνο για ερευνητικούς σκοπούς. ΔΕΝ πρέπει να χρησιμοποιείται για πραγματικές ιατρικές διαγνώσεις χωρίς την επίβλεψη επαγγελματιών υγείας.


Model Description

PALADIM is a novel architecture combining:

  • RoBERTa-base foundation (125M params)
  • Mixture of Experts (MoE) with 16 experts per layer × 12 layers (768M params)
  • LoRA adapters for efficient fine-tuning (148M params)
  • Plastic memory consolidation for continual learning
  • Meta-learning controller for adaptive optimization

Total parameters: 1,042,710,532 (1.04B)

Architecture Highlights

Mixture of Experts (MoE)

  • 16 specialized experts per transformer layer
  • Top-2 expert routing with load balancing
  • Enables specialization for different medical domains

Plastic Memory System

  • Experience replay buffer for catastrophic forgetting prevention
  • Hebbian-inspired consolidation
  • Maintains knowledge across sequential task learning

LoRA Integration

  • Low-Rank Adaptation (rank=16) on all attention layers
  • Efficient parameter updates
  • Preserves base model knowledge

Training Details

  • Training Data: Medical patient cases with drug recommendations across 609 medication classes
  • Training Samples: 1,794 patient cases
  • Drug Classes: 609 different medications (from common to specialized treatments)
  • Epochs: Multiple continual learning cycles
  • Optimization: AdamW with meta-learning rate adaptation
  • Hardware: Trained on CPU (can be accelerated with GPU/TPU)
  • Trained: November 29, 2025

Performance

This is a multi-class classification model predicting across 609 different medications:

Based on test predictions:

  • Confidence Range: 61-67% on cardiovascular/metabolic cases
  • Consistency: High agreement across similar medical conditions
  • Top-K Predictions: Model provides ranked drug recommendations
  • Dataset: 1,794 patient training samples across diverse medical conditions

Sample Drug Classes (subset of 609 total)

The model predicts across medications including: Metformin, Atorvastatin, Pembrolizumab, Rituximab, Adalimumab, Insulin Glargine, Levothyroxine, Warfarin, Nivolumab, and 600+ others covering cardiovascular, oncology, diabetes, immunology, neurology, and specialized treatments.

Test Results (5 cases)

Case 1: Hypertension & diabetes → Prediction made (61.45%)
Case 2: High BP & cholesterol  → Prediction made (65.79%)
Case 3: Chest pain & SOB       → Prediction made (63.04%)
Case 4: Heart failure          → Prediction made (62.36%)
Case 5: Type 2 diabetes        → Prediction made (66.59%)

Note: Specific drug names available via drug_mapping.json file included in repository.

Usage

Installation

pip install torch transformers peft

Quick Start

import torch
from transformers import AutoTokenizer
from paladim import PALADIM
from config import PALADIMConfig

# Load model
config = PALADIMConfig()
config.device = 'cpu'  # or 'cuda'
model = PALADIM(config)
tokenizer = AutoTokenizer.from_pretrained('roberta-base')

# Load trained weights
checkpoint = torch.load('paladim_20251129_203522.pt', map_location='cpu', weights_only=False)
model.load_state_dict(checkpoint['model_state_dict'], strict=False)
model.eval()

# Make prediction
patient_case = "Patient with hypertension and diabetes, currently on metformin"
inputs = tokenizer(patient_case, return_tensors='pt', padding=True, truncation=True)

with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.softmax(outputs.logits, dim=-1)
    
    # Get top-3 recommendations
    top_k = torch.topk(probs[0], k=3)
    
# Load drug mapping to get names
import json
with open('drug_mapping.json', 'r') as f:
    drug_mapping = json.load(f)

print("Top 3 Drug Recommendations:")
for idx, score in zip(top_k.indices.tolist(), top_k.values.tolist()):
    drug_name = drug_mapping['idx_to_drug'][str(idx)]
    print(f"{drug_name}: {score:.2%}")

Running the Test Script

python test_paladim.py

Model Architecture Details

RoBERTa-base (125M params)
├── Embedding Layer (38M params)
├── 12 Transformer Layers
│   ├── Self-Attention + LoRA (query, key, value, output)
│   ├── MoE Layer (16 experts)
│   │   ├── Gating Network
│   │   └── Expert Networks (2x FFN per expert)
│   └── Layer Normalization
└── Classification Head (2 classes)

Plastic Memory System
├── Experience Replay Buffer
├── Consolidation Module
└── Meta-Learning Controller

Continual Learning Capabilities

PALADIM is designed for:

  1. Sequential task learning without catastrophic forgetting
  2. Adaptive learning rates via meta-controller
  3. Knowledge consolidation through experience replay
  4. Domain specialization via MoE routing

Files in This Repository

  • paladim.py - Core model architecture
  • config.py - Configuration class
  • moe_layer.py - Mixture of Experts implementation
  • plastic_memory.py - Memory consolidation system
  • consolidation.py - Experience replay logic
  • meta_controller.py - Meta-learning controller
  • test_paladim.py - Quick test script
  • requirements.txt - Dependencies
  • paladim_20251129_203522.pt - Trained model checkpoint

Limitations

  • Trained on 1,794 patient cases (relatively small dataset for 609 drug classes)
  • May require domain-specific fine-tuning for specific medical specialties
  • CPU inference is slow (~5-10s per prediction)
  • Does not include drug interaction checking or contraindication detection
  • Predictions should be validated by medical professionals

Future Improvements

  • Train on larger medical datasets (10K+ patient cases)
  • Add drug interaction checking and contraindication detection
  • Include dosage recommendations
  • Add explainability features (attention visualization, SHAP values)
  • Optimize inference speed with model quantization
  • Implement real-time learning with streaming data
  • Add safety guardrails and clinical validation
  • Multi-modal inputs (lab results, imaging data)

Citation

If you use PALADIM in your research, please cite:

@misc{paladim2025,
  title={PALADIM: Pre-Adaptive Learning Architecture with Dual-Process Hebbian-MoE Schema},
  author={Agge, Nick},
  year={2025},
  url={https://huggingface.co/nickagge/paladim-1b-medical},
  note={A 1.04B parameter continual learning model for medical drug recommendation across 609 medication classes}
}

License

Apache License 2.0

Copyright 2025 Nick Agge

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Contact

Disclaimer

⚠️ Medical AI Warning: This model is for research purposes only. It should NOT be used for actual medical diagnosis or treatment without proper validation and clinical oversight. Always consult qualified healthcare professionals for medical decisions.

Downloads last month
12
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using nickagge/paladim-1b-medical 1

Evaluation results