--- license: other license_name: flux-1-dev-non-commercial-license license_link: https://huggingface.co/black-forest-labs/FLUX.1-dev/resolve/main/LICENSE.md language: - en base_model: - black-forest-labs/FLUX.1-dev pipeline_tag: text-to-image tags: - flux.1-dev - flux - text-to-image - multi-subject - FOCUS - flow-matching - optimal-control - fine-tuned --- ![FLUX.1 [dev] + FOCUS](./teasers.jpg) # FLUX.1 [dev] fine-tuned for multi-subject prompts **TL;DR**: A **fine-tuned derivative of `black-forest-labs/FLUX.1-dev`** focused on **multi-subject fidelity**—keeping multiple entities and their attributes unentangled while **preserving base style**. Works across animals, people, and objects. Read the paper: **[Optimal Control Meets Flow Matching: A Principled Route to Multi-Subject Fidelity](https://arxiv.org/abs/2510.02315)**. > ⚠️ Licensing: This model inherits the **BlackForest Community License** from the base model and is distributed under compatible terms. Use is subject to the base model’s license --- ## What’s improved - **Entity disentanglement**: better separation across 2–4 subjects, fewer merges/omissions. - **Attribute binding**: colors, clothing, and small accessories stick to the correct subject. - **Single Subject**: also improve sinlge subject generation, while staying stylistic close to base model. --- ## Quick start (Diffusers) Install the [🧨 diffusers library](https://github.com/huggingface/diffusers) ``` pip install -U transformers==4.53.0 diffusers==0.33.1 ``` Then: ```python import torch from diffusers import FluxPipeline pipe = FluxPipeline.from_pretrained( "ericbill21/focus_flux", torch_dtype=torch.bfloat16 ).to("cuda") # For smaller GPUs use: pipe.enable_sequential_cpu_offload() instead of .to("cuda") image = pipe( prompt="A lion and a tiger resting side by side in a jungle clearing", num_inference_steps=28, guidance_scale=3.5, max_sequence_length=256, height=512, width=512, generator=torch.Generator("cpu").manual_seed(5), ).images[0] image.save("sample.png") ``` Since this uses the standard Diffusers pipeline, you can apply features like xFormers attention, VAE tiling/slicing, and quantization as usual. ## How was this achieved? We cast multi-subject fidelity as a stochastic optimal control problem over flow-matching samplers and fine-tune via FOCUS (an adjoint-matching heuristic). A lightweight controller is trained to respect subject identity, attributes, and spatial relations while staying close to the base distribution, yielding improved multi-subject fidelity without sacrificing style. Full details and ablations are in the paper and code. - Paper: [https://arxiv.org/abs/2510.02315](https://arxiv.org/abs/2510.02315) - Code: [https://github.com/ericbill21/FOCUS](https://github.com/ericbill21/FOCUS) ## Model details - Base: `black-forest-labs/FLUX.1-dev` - Type: full pipeline (no LoRA required at inference) - Intended use: research/creative work where multi-subject consistency matters - Limitations: under extreme clutter or highly similar subjects, attributes may still leak; biases of the base model may persist. # Citation If you find this useful, please cite: ``` @article{Bill2025FOCUS, title = {Optimal Control Meets Flow Matching: A Principled Route to Multi-Subject Fidelity}, author = {Eric Tillmann Bill and Enis Simsar and Thomas Hofmann}, journal = {arXiv preprint arXiv:2510.02315}, year = {2025}, url = {https://arxiv.org/abs/2510.02315} } ``` ## Contact Feedback and issues welcome via the Hugging Face model page or GitHub.