# NEXUS Shared Expert Weights (10K Steps)

Trained shared expert weights from NEXUS (Neural Expert Unified Specialization) calibration run.

## Model Details

- **Base Model**: GPT-OSS 120B
- **Training Steps**: 10,000
- **Method**: Top-24 PCA-selected experts, frozen router
- **Parameters**: 896,106,240 (shared expert only)
- **Size**: 1.67GB (BF16)
- **Training Config**: Frozen router, advanced scheduler, KL distillation

## What This Contains

This file contains ONLY the shared expert weights (216 parameter tensors) from a NEXUS-trained model. 

To use:
1. Start with base GPT-OSS 120B model
2. Add NEXUS shared expert architecture
3. Load these weights

## Usage

```python
import torch

# Load weights
shared_weights = torch.load("nexus_shared_expert_weights_10k.pt")

# Apply to model with NEXUS architecture
model.load_state_dict(shared_weights, strict=False)
```

## About NEXUS

NEXUS enables efficient domain specialization of massive MoE models by training a small shared expert while keeping routed experts frozen.

See: https://github.com/yourusername/nexus

## License

MIT