Lie-Holonomy Transformer (LHT)
A PyTorch implementation of the gauge-theoretic reasoning architecture from "Beyond Holonomy: Lie-Algebraic Symbol Emergence and the Homotopy Type Structure of Neural Reasoning."
Core Ideas
This architecture treats reasoning as geometry:
| Concept | Mathematical Structure | Implementation |
|---|---|---|
| Propositions | Manifold M | Embedding space |
| Inference | Parallel transport | Gauge-covariant attention |
| Consistency | Holonomy = Identity | Holonomy loss |
| Symbols | Lie algebra generators | Generator network |
| Proof equivalence | Homotopy | Layer depth |
Architecture Overview
Input tokens
β
βΌ
βββββββββββββββββββββββββββββββββββββββ
β Token Embedding (Proposition M) β
β + Position Embedding β
β + Fiber Initialization (gauge) β
βββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββ
β LHT Layer (Γ n_layers) β
β βββββββββββββββββββββββββββββββ β
β β Connection Network A(x) β β β Learns gauge connection
β β Parallel Transport Ξ_{jβi} β β β Transports fiber elements
β β Gauge-Covariant Attention β β β Modified self-attention
β β Lie Algebra Generator β β β Generates inference ops
β β Generator Application β β β Applies exp(X) to fiber
β βββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββ
β Output: logits + geometric losses β
βββββββββββββββββββββββββββββββββββββββ
Key Components
1. Connection Network
Learns the gauge connection Ο that defines how to parallel transport inferential states:
A_ΞΌ(x) β gl(k,β) # Lie algebra valued 1-form
2. Parallel Transport
Computes transport operators between positions:
Ξ_{jβi} = exp(-A_ΞΌ(x_j)(x_i - x_j)^ΞΌ)
3. Gauge-Covariant Attention
Standard attention with parallel transport of values:
# Standard: Attn(Q,K,V)_i = Ξ£_j Ξ±_ij V_j
# Gauge: GaugeAttn_i = Ξ£_j Ξ±_ij Ξ_{jβi}(V_j)
4. Holonomy Loss
Enforces reasoning consistency by requiring closed loops to return to identity:
L_hol = E[||Hol_Ξ³ - I||Β²_F]
5. Curvature Regularization
Encourages flat reasoning spaces where order doesn't matter:
L_curv = E[||F(x)||Β²_F] where F = dΟ + Οβ§Ο
Installation
pip install torch
Usage
Basic
from lht import LieHolonomyTransformer, LHTConfig
# Create model
config = LHTConfig(
vocab_size=32000,
d_model=512,
d_fiber=64,
n_heads=8,
n_layers=6,
lie_algebra_rank=8,
)
model = LieHolonomyTransformer(config)
# Forward pass
output = model(
input_ids=tokens,
labels=labels,
return_geometric_losses=True
)
# Get losses
lm_loss = output['lm_loss']
holonomy_loss = output['holonomy_loss']
curvature_loss = output['curvature_loss']
total_loss = model.get_total_loss(output)
Training with Geometric Loss Annealing
from lht import LHTTrainer
trainer = LHTTrainer(model, optimizer, config)
for batch in dataloader:
metrics = trainer.train_step(batch)
# Early training: high curvature loss β flat representations
# Mid training: high holonomy loss β consistency
# Late training: high waypoint loss β discrete structure
Waypoint Detection
from lht import WaypointDetector
detector = WaypointDetector(config, n_waypoints=32)
waypoint_ids, stability = detector(representations)
Configuration
| Parameter | Description | Default |
|---|---|---|
d_model |
Proposition manifold dimension | 512 |
d_fiber |
Fiber (gauge) dimension | 64 |
lie_algebra_rank |
k for GL(k,β) structure group | 8 |
lambda_holonomy |
Weight for holonomy loss | 0.1 |
lambda_curvature |
Weight for curvature loss | 0.01 |
lambda_waypoint |
Weight for waypoint stability | 0.05 |
Theoretical Predictions
The framework makes testable predictions:
Chain-of-thought benefit correlates with curvature - High-curvature domains (causal reasoning) benefit more from CoT than low-curvature domains (arithmetic)
Waypoints emerge spontaneously - Training with holonomy loss should cause discrete symbol-like structures to form at flat loci
Holonomy predicts errors - Incorrect reasoning paths should have higher holonomy magnitude
Compositional generalization improves - Holonomy constraints force consistent composition
File Structure
lie_holonomy_transformer/
βββ lht.py # Core implementation
βββ train.py # Training script
βββ README.md # This file
βββ experiments/ # Benchmark code (TODO)
References
- "Beyond Holonomy: Lie-Algebraic Symbol Emergence..." (the paper)
- Cohen et al. (2019). Gauge Equivariant Convolutional Networks
- Weiler & Cesa (2019). General E(2)-Equivariant Steerable CNNs
- The Univalent Foundations Program (2013). Homotopy Type Theory
License
MIT