PALADIM: Pre-Adaptive Learning Architecture with Dual-Process Hebbian-MoE Schema
A 1.04B parameter continual learning model for medical drug recommendation
Ελληνική Περίληψη (Greek Summary)
Τι είναι το PALADIM;
Το PALADIM είναι ένα μοντέλο τεχνητής νοημοσύνης με 1.04 δισεκατομμύρια παραμέτρους για την πρόταση φαρμάκων σε ιατρικά περιστατικά.
Βασικά Χαρακτηριστικά:
- 🧠 Αρχιτεκτονική Συνεχούς Μάθησης - μαθαίνει νέες πληροφορίες χωρίς να ξεχνά τις παλιές
- 💊 609 διαφορετικά φάρμακα - προτείνει θεραπείες από μεγάλο φάσμα φαρμάκων
- 🔬 Mixture of Experts (MoE) - 16 εξειδικευμένοι "εμπειρογνώμονες" ανά επίπεδο
- 📊 Εκπαιδευμένο σε 1,794 περιστατικά ασθενών
Τρέχουσα Υποστήριξη:
- ✅ Αγγλικά (English) - πλήρης υποστήριξη
- ⏳ Ελληνικά (Greek) - προγραμματισμένο για μελλοντική εκπαίδευση
Άδεια Χρήσης: Apache 2.0 (ανοιχτός κώδικας)
⚠️ Σημαντική Σημείωση: Αυτό το μοντέλο είναι μόνο για ερευνητικούς σκοπούς. ΔΕΝ πρέπει να χρησιμοποιείται για πραγματικές ιατρικές διαγνώσεις χωρίς την επίβλεψη επαγγελματιών υγείας.
Model Description
PALADIM is a novel architecture combining:
- RoBERTa-base foundation (125M params)
- Mixture of Experts (MoE) with 16 experts per layer × 12 layers (768M params)
- LoRA adapters for efficient fine-tuning (148M params)
- Plastic memory consolidation for continual learning
- Meta-learning controller for adaptive optimization
Total parameters: 1,042,710,532 (1.04B)
Architecture Highlights
Mixture of Experts (MoE)
- 16 specialized experts per transformer layer
- Top-2 expert routing with load balancing
- Enables specialization for different medical domains
Plastic Memory System
- Experience replay buffer for catastrophic forgetting prevention
- Hebbian-inspired consolidation
- Maintains knowledge across sequential task learning
LoRA Integration
- Low-Rank Adaptation (rank=16) on all attention layers
- Efficient parameter updates
- Preserves base model knowledge
Training Details
- Training Data: Medical patient cases with drug recommendations across 609 medication classes
- Training Samples: 1,794 patient cases
- Drug Classes: 609 different medications (from common to specialized treatments)
- Epochs: Multiple continual learning cycles
- Optimization: AdamW with meta-learning rate adaptation
- Hardware: Trained on CPU (can be accelerated with GPU/TPU)
- Trained: November 29, 2025
Performance
This is a multi-class classification model predicting across 609 different medications:
Based on test predictions:
- Confidence Range: 61-67% on cardiovascular/metabolic cases
- Consistency: High agreement across similar medical conditions
- Top-K Predictions: Model provides ranked drug recommendations
- Dataset: 1,794 patient training samples across diverse medical conditions
Sample Drug Classes (subset of 609 total)
The model predicts across medications including: Metformin, Atorvastatin, Pembrolizumab, Rituximab, Adalimumab, Insulin Glargine, Levothyroxine, Warfarin, Nivolumab, and 600+ others covering cardiovascular, oncology, diabetes, immunology, neurology, and specialized treatments.
Test Results (5 cases)
Case 1: Hypertension & diabetes → Prediction made (61.45%)
Case 2: High BP & cholesterol → Prediction made (65.79%)
Case 3: Chest pain & SOB → Prediction made (63.04%)
Case 4: Heart failure → Prediction made (62.36%)
Case 5: Type 2 diabetes → Prediction made (66.59%)
Note: Specific drug names available via drug_mapping.json file included in repository.
Usage
Installation
pip install torch transformers peft
Quick Start
import torch
from transformers import AutoTokenizer
from paladim import PALADIM
from config import PALADIMConfig
# Load model
config = PALADIMConfig()
config.device = 'cpu' # or 'cuda'
model = PALADIM(config)
tokenizer = AutoTokenizer.from_pretrained('roberta-base')
# Load trained weights
checkpoint = torch.load('paladim_20251129_203522.pt', map_location='cpu', weights_only=False)
model.load_state_dict(checkpoint['model_state_dict'], strict=False)
model.eval()
# Make prediction
patient_case = "Patient with hypertension and diabetes, currently on metformin"
inputs = tokenizer(patient_case, return_tensors='pt', padding=True, truncation=True)
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)
# Get top-3 recommendations
top_k = torch.topk(probs[0], k=3)
# Load drug mapping to get names
import json
with open('drug_mapping.json', 'r') as f:
drug_mapping = json.load(f)
print("Top 3 Drug Recommendations:")
for idx, score in zip(top_k.indices.tolist(), top_k.values.tolist()):
drug_name = drug_mapping['idx_to_drug'][str(idx)]
print(f"{drug_name}: {score:.2%}")
Running the Test Script
python test_paladim.py
Model Architecture Details
RoBERTa-base (125M params)
├── Embedding Layer (38M params)
├── 12 Transformer Layers
│ ├── Self-Attention + LoRA (query, key, value, output)
│ ├── MoE Layer (16 experts)
│ │ ├── Gating Network
│ │ └── Expert Networks (2x FFN per expert)
│ └── Layer Normalization
└── Classification Head (2 classes)
Plastic Memory System
├── Experience Replay Buffer
├── Consolidation Module
└── Meta-Learning Controller
Continual Learning Capabilities
PALADIM is designed for:
- Sequential task learning without catastrophic forgetting
- Adaptive learning rates via meta-controller
- Knowledge consolidation through experience replay
- Domain specialization via MoE routing
Files in This Repository
paladim.py- Core model architectureconfig.py- Configuration classmoe_layer.py- Mixture of Experts implementationplastic_memory.py- Memory consolidation systemconsolidation.py- Experience replay logicmeta_controller.py- Meta-learning controllertest_paladim.py- Quick test scriptrequirements.txt- Dependenciespaladim_20251129_203522.pt- Trained model checkpoint
Limitations
- Trained on 1,794 patient cases (relatively small dataset for 609 drug classes)
- May require domain-specific fine-tuning for specific medical specialties
- CPU inference is slow (~5-10s per prediction)
- Does not include drug interaction checking or contraindication detection
- Predictions should be validated by medical professionals
Future Improvements
- Train on larger medical datasets (10K+ patient cases)
- Add drug interaction checking and contraindication detection
- Include dosage recommendations
- Add explainability features (attention visualization, SHAP values)
- Optimize inference speed with model quantization
- Implement real-time learning with streaming data
- Add safety guardrails and clinical validation
- Multi-modal inputs (lab results, imaging data)
Citation
If you use PALADIM in your research, please cite:
@misc{paladim2025,
title={PALADIM: Pre-Adaptive Learning Architecture with Dual-Process Hebbian-MoE Schema},
author={Agge, Nick},
year={2025},
url={https://huggingface.co/nickagge/paladim-1b-medical},
note={A 1.04B parameter continual learning model for medical drug recommendation across 609 medication classes}
}
License
Apache License 2.0
Copyright 2025 Nick Agge
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Contact
- Hugging Face: @nickagge
- Issues: GitHub Issues
Disclaimer
⚠️ Medical AI Warning: This model is for research purposes only. It should NOT be used for actual medical diagnosis or treatment without proper validation and clinical oversight. Always consult qualified healthcare professionals for medical decisions.
- Downloads last month
- 12
Space using nickagge/paladim-1b-medical 1
Evaluation results
- Confidence Range on Medical Patient Casesself-reported0.61-0.67