SynID

Zero-shot identity-consistent image generation from text alone.

Paper GitHub License: MIT


SynID generates multiple consistent images of the same character from a text description β€” no real reference photos, no dataset, no pretraining. It runs in about five minutes per character on a single T4 GPU.

The pipeline is a closed-loop self-distillation system: the diffusion model generates its own training data, refines its own identity embedding, and trains its own adapter β€” entirely from text.


How it works

SynID Pipeline

SynID Architecture

Text prompt
    β”‚
    β–Ό
Multi-anchor ensemble
  4 synthetic anchors, softmax-weighted by CLIP similarity
    β”‚
    β–Ό
Multi-token identity projector
  CLIP embedding β†’ 4 Γ— 768 identity tokens
  Trained with text alignment + diversity + ArcFace losses
    β”‚
    β–Ό
Bootstrap refinement
  20 expression-diverse candidates generated and scored
  Top-K selected with diversity enforcement
  Projector retrained on refined embedding
    β”‚
    β–Ό
Drift correction
  Probe image generated, CLIP drift measured
  Projector fine-tuned to close the gap (2 rounds)
    β”‚
    β–Ό
UNet adapter training
  Lightweight cross-attention adapters on all transformer blocks
  Trained on 8 synthetic images: MSE + CLIP + ArcFace losses (~260 steps)
    β”‚
    β–Ό
Generation
  Dual-level identity injection:
    Β· Text embedding (coarse, adaptive scale)
    Β· UNet cross-attention (fine-grained, every denoising step)
  Identity-aware negative conditioning

Results

Multi-character benchmark (5 characters)

Multi-Domain Character Profiles

Character CLIP Identity Pairwise Consistency
Woman (brunette) 0.9515 0.9435 Β± 0.022
Elderly man 0.9508 0.9430 Β± 0.031
Anime girl 0.9655 0.9541 Β± 0.013
Young man 0.9407 0.9528 Β± 0.019
Woman (redhead) 0.9625 0.9411 Β± 0.013
Mean 0.9542 0.9469 Β± 0.020

ArcFace (full system): 0.791 β€” comparable to Arc2Face (~0.79) trained on 21M real faces.

Evaluation suites β€” expression, scene, pose, seed robustness

Evaluation Suites

Ablation β€” each component's contribution

Ablation Progression

Configuration CLIP ArcFace Time
Baseline β€” single anchor, no bootstrap, no adapter 0.9603 0.7562 55 s
+ Multi-anchor ensemble 0.9676 0.7581 78 s
+ Bootstrap + drift correction 0.9665 0.7560 154 s
Full system β€” + UNet adapter 0.9690 0.7912 158 s

Comparison with prior methods

Method CLIP ArcFace Real image required Training data
IP-Adapter FaceID 0.854 0.132 Yes ~1M pairs
Arc2Face β€” ~0.79 Yes ~21M faces
PhotoMaker β€” ~0.618 Yes (multiple) Real
SynID (ours) 0.969 0.791 No 8 synthetic

Limitations and failure cases

Failure Cases


Quick start

Colab (recommended β€” T4 GPU)

Upload synid_ui.py, identity_projection_complete.py, and evaluation_harness.py to the same Colab working directory, then run:

!pip install -q gradio diffusers transformers accelerate controlnet_aux \
    safetensors huggingface_hub insightface onnxruntime torchvision

exec(open("synid_ui.py").read())

Then in the UI:

  1. Click Load Pipelines β€” loads DreamShaper, ControlNet, CLIP, OpenPose
  2. Enter a character description and click Create Character
  3. Use the Generate, Pose-Free, or Evaluate tabs
  4. Scan the QR code to open the app on mobile

Local (GPU required)

git clone https://github.com/rxbinsingh/SynID
cd SynID
pip install -r requirements.txt
python synid_ui.py

GPU users: replace onnxruntime with onnxruntime-gpu in requirements.txt for faster face detection.

Scripting / research backend

from identity_projection_complete import (
    init_synid_backend,
    create_character,
    attach_identity_adapters,
    register_adapter_hooks,
    generate_with_adapter,
    save_checkpoint,
    load_checkpoint,
    pipe,
)

init_synid_backend()

profile = create_character(
    identity_prompt="young woman, brown eyes, dark hair, photorealistic",
    anchor_seed=1234,
    num_identity_tokens=4,
    train_steps=250,
)

adapters = attach_identity_adapters(pipe.unet, identity_dim=768, scale=0.5)
hooks    = register_adapter_hooks(pipe.unet)

image = generate_with_adapter(
    profile.identity_tokens,
    profile.character_core_prompt + ", bright smile, studio portrait",
    profile.pose_image,
    pipe.unet,
    pipe,
    seed=5555,
)
image.save("output.png")

Full benchmark (5 characters)

from identity_projection_complete import init_synid_backend, run_full_benchmark

init_synid_backend()
run_full_benchmark()

Files

File Description
synid_ui.py Gradio UI β€” staged pipeline loading, character creation, generation, evaluation, mobile QR
identity_projection_complete.py Full backend β€” initialization, identity learning, adapter training, checkpointing
evaluation_harness.py Evaluation suites (quick / full), ablation study, standardized benchmark
requirements.txt Python dependencies

Checkpointing

Save and reload a trained character profile:

from identity_projection_complete import save_checkpoint, load_checkpoint, attach_identity_adapters

# save
save_checkpoint(profile, adapters, "/path/to/checkpoints/my_character")

# load
adapters = attach_identity_adapters(pipe.unet, identity_dim=768, scale=0.5)
profile  = load_checkpoint(adapters, "/path/to/checkpoints/my_character")

Export as a portable .character archive:

from identity_projection_complete import export_character
export_character("my_character", checkpoint_dir="/path/to/checkpoints")

Requirements

  • Python 3.9+
  • CUDA GPU (T4 or better recommended; 8 GB+ VRAM)
  • See requirements.txt for full dependency list

Key dependencies: torch, diffusers, transformers, controlnet_aux, insightface, gradio


Paper

SynID: Zero-Shot Identity-Consistent Image Generation via Synthetic Bootstrapping and On-the-Fly UNet Adaptation Robin Singh, 2025 https://doi.org/10.13140/RG.2.2.30671.85925

@article{singh2025synid,
  title   = {SynID: Zero-Shot Identity-Consistent Image Generation via
             Synthetic Bootstrapping and On-the-Fly UNet Adaptation},
  author  = {Singh, Robin},
  year    = {2025},
  doi     = {10.13140/RG.2.2.30671.85925},
  url     = {https://doi.org/10.13140/RG.2.2.30671.85925}
}

License

MIT Β© 2025 Robin Singh

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support