SelfIE Adapters for Qwen2.5-72B-Instruct

Trained adapter modules for SelfIE (Self-Interpretation of Embeddings), enabling language models to interpret their own internal representations in natural language.

This adapter is a trained projection that maps hidden-state vectors from Qwen/Qwen2.5-72B-Instruct into soft token embeddings for self-interpretation via patching. Part of the Qwen 2.5 scaling series (7B, 14B, 32B, 72B).

Code: github.com/agencyenterprise/selfie-adapters

Warning: This adapter is trained specifically for Qwen/Qwen2.5-72B-Instruct (residual stream dim 8192). It will produce garbage results on other models, even if tensor shapes happen to match.

Adapter

File Architecture Training Data Params Val Loss
wikipedia-full-rank.safetensors Full-rank affine Wikipedia contrastive vectors 67,117,056 1.260

Usage

from selfie_adapters import load_adapter

adapter = load_adapter("wikipedia-full-rank.safetensors", device="cuda")
soft_tokens = adapter.transform(hidden_state_vectors)

Prompt Template

This adapter uses the following SelfIE prompt template (with <|fim_pad|> as the injection site for the soft token):

<|im_start|>user
What is the meaning of "<|fim_pad|>"?<|im_end|>
<|im_start|>assistant
The meaning of "<|fim_pad|>" is "

File Format

The .safetensors file contains the projection weights with full training config embedded in the header metadata. You can inspect the metadata without loading the tensors:

from safetensors import safe_open
import json

with safe_open("wikipedia-full-rank.safetensors", framework="pt") as f:
    meta = f.metadata()
    print(meta["projection_type"])  # "full_rank"
    print(meta["model_name"])       # "Qwen/Qwen2.5-72B-Instruct"
    config = json.loads(meta["config_json"])  # full training config

Mean Vectors for Contrastive Adapters

The wikipedia-full-rank adapter was trained on contrastive hidden-state vectors — raw activations with the per-layer dataset mean subtracted. To use this adapter on new inputs, you need the same mean vectors that were subtracted during training.

The file mean-vectors.safetensors contains one mean vector per layer (40 layers: 20–59).

Loading and using mean vectors

import json
from safetensors import safe_open
from safetensors.torch import load_file

# Load all mean vectors
mean_vectors = load_file("mean-vectors.safetensors")

# Access a specific layer's mean vector
mean_vec = mean_vectors["layer_40"]  # shape: [8192], dtype: float32

# Given a raw hidden state from that layer:
contrastive_vec = raw_hidden_state.float() - mean_vec
soft_tokens = adapter.transform(contrastive_vec)

# To see which layers are available:
with safe_open("mean-vectors.safetensors", framework="pt") as f:
    meta = f.metadata()
    layers = json.loads(meta["layer_indices"])
    print(layers)  # [20, 21, ..., 59]

What are the mean vectors?

They are the average hidden-state vectors at each layer across all 49,637 prompts in the keenanpepper/fifty-thousand-things dataset, extracted using the prompt template "Tell me about {title}." with the Qwen chat format. Subtracting them ensures the adapter sees zero-centered inputs matching its training distribution.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for keenanpepper/selfie-adapters-qwen-2.5-72b-instruct

Base model

Qwen/Qwen2.5-72B
Finetuned
(60)
this model

Collection including keenanpepper/selfie-adapters-qwen-2.5-72b-instruct