Input Image Qwen2VL-7B Transformer Self-Attention Layer Self-Attention Layer Self-Attention Layer Self-Attention Layer Optional GridDot Panel Qwen2VL Image Embeddings Connector Optional T5 Text Embeddings FLUX Transformer Double Block Layer Single Block Layer Generated Image ×N ×M