Input Image
Qwen2VL-7B Transformer
Self-Attention Layer
Self-Attention Layer
Self-Attention Layer
Self-Attention Layer
Optional
GridDot Panel
Qwen2VL Image Embeddings
Connector
Optional
T5 Text Embeddings
FLUX Transformer
Double Block Layer
Single Block Layer
Generated Image
×N
×M