SelfIE Adapters for Qwen2.5-72B-Instruct
Trained adapter modules for SelfIE (Self-Interpretation of Embeddings), enabling language models to interpret their own internal representations in natural language.
This adapter is a trained projection that maps hidden-state vectors from Qwen/Qwen2.5-72B-Instruct into soft token embeddings for self-interpretation via patching. Part of the Qwen 2.5 scaling series (7B, 14B, 32B, 72B).
Code: github.com/agencyenterprise/selfie-adapters
Warning: This adapter is trained specifically for
Qwen/Qwen2.5-72B-Instruct(residual stream dim 8192). It will produce garbage results on other models, even if tensor shapes happen to match.
Adapter
| File | Architecture | Training Data | Params | Val Loss |
|---|---|---|---|---|
wikipedia-full-rank.safetensors |
Full-rank affine | Wikipedia contrastive vectors | 67,117,056 | 1.260 |
Usage
from selfie_adapters import load_adapter
adapter = load_adapter("wikipedia-full-rank.safetensors", device="cuda")
soft_tokens = adapter.transform(hidden_state_vectors)
Prompt Template
This adapter uses the following SelfIE prompt template (with <|fim_pad|> as the injection site for the soft token):
<|im_start|>user
What is the meaning of "<|fim_pad|>"?<|im_end|>
<|im_start|>assistant
The meaning of "<|fim_pad|>" is "
File Format
The .safetensors file contains the projection weights with full training config embedded in the header metadata. You can inspect the metadata without loading the tensors:
from safetensors import safe_open
import json
with safe_open("wikipedia-full-rank.safetensors", framework="pt") as f:
meta = f.metadata()
print(meta["projection_type"]) # "full_rank"
print(meta["model_name"]) # "Qwen/Qwen2.5-72B-Instruct"
config = json.loads(meta["config_json"]) # full training config
Mean Vectors for Contrastive Adapters
The wikipedia-full-rank adapter was trained on contrastive hidden-state vectors — raw activations with the per-layer dataset mean subtracted. To use this adapter on new inputs, you need the same mean vectors that were subtracted during training.
The file mean-vectors.safetensors contains one mean vector per layer (40 layers: 20–59).
Loading and using mean vectors
import json
from safetensors import safe_open
from safetensors.torch import load_file
# Load all mean vectors
mean_vectors = load_file("mean-vectors.safetensors")
# Access a specific layer's mean vector
mean_vec = mean_vectors["layer_40"] # shape: [8192], dtype: float32
# Given a raw hidden state from that layer:
contrastive_vec = raw_hidden_state.float() - mean_vec
soft_tokens = adapter.transform(contrastive_vec)
# To see which layers are available:
with safe_open("mean-vectors.safetensors", framework="pt") as f:
meta = f.metadata()
layers = json.loads(meta["layer_indices"])
print(layers) # [20, 21, ..., 59]
What are the mean vectors?
They are the average hidden-state vectors at each layer across all 49,637 prompts in the keenanpepper/fifty-thousand-things dataset, extracted using the prompt template "Tell me about {title}." with the Qwen chat format. Subtracting them ensures the adapter sees zero-centered inputs matching its training distribution.