nielsr HF Staff commited on
Commit
6b53cea
Β·
verified Β·
1 Parent(s): bb34470

Improve model card for Lego-Edit: Add metadata, links, abstract, and structure

Browse files

This pull request significantly enhances the model card for the `Lego-Edit` framework by:

- **Updating metadata**:
- Changing the `license` to `cc-by-nc-4.0` to reflect the model's licensing terms as specified in the GitHub repository's disclaimer.
- Adding `pipeline_tag: image-to-image` to improve discoverability for image editing and manipulation tasks.
- Adding `library_name: transformers` based on evidence of `Qwen2_5_VLForConditionalGeneration` architecture, `Qwen2Tokenizer`, `Qwen2_5_VLProcessor`, and explicit mention of `transformers` library modification in the quick start guide.
- **Adding key links**: Including direct links to the paper, project page, and GitHub repository for easy access to related resources.
- **Incorporating abstract**: Providing the paper's abstract to give a quick overview of the model's purpose and methodology.
- **Restructuring content**: Reorganizing the model card for better readability, integrating images directly from the GitHub repository, and adding sections for Features, Disclaimer, Citation, and Acknowledgments from the official GitHub README.
- **Removing verbose file information**: As per instructions, the detailed file content has been removed from the model card.

For detailed setup and usage instructions, please refer to the provided GitHub repository.

Files changed (1) hide show
  1. README.md +79 -3
README.md CHANGED
@@ -1,3 +1,79 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ pipeline_tag: image-to-image
4
+ library_name: transformers
5
+ ---
6
+
7
+ # Lego-Edit: A General Image Editing Framework with Model-Level Bricks and MLLM Builder
8
+
9
+ <p align="center">
10
+ <img src="https://github.com/xiaomi-research/lego-edit/raw/main/resources/lego_pic.png" alt="Lego-Edit" width="240"/>
11
+ </p>
12
+
13
+ Lego-Edit is an instruction-based image editing framework introduced in the paper [Lego-Edit: A General Image Editing Framework with Model-Level Bricks and MLLM Builder](https://huggingface.co/papers/2509.12883).
14
+
15
+ - πŸ“š **Paper**: [Lego-Edit: A General Image Editing Framework with Model-Level Bricks and MLLM Builder](https://huggingface.co/papers/2509.12883)
16
+ - 🌐 **Project Page**: https://xiaomi-research.github.io/lego-edit/
17
+ - πŸ’» **Code / GitHub Repository**: https://github.com/xiaomi-research/lego-edit
18
+ - πŸš€ **Live Demo**: https://editdemo.ai.xiaomi.net/
19
+
20
+ ## Abstract
21
+ Instruction-based image editing has garnered significant attention due to its direct interaction with users. However, real-world user instructions are immensely diverse, and existing methods often fail to generalize effectively to instructions outside their training domain, limiting their practical application. To address this, we propose Lego-Edit, which leverages the generalization capability of Multi-modal Large Language Model (MLLM) to organize a suite of model-level editing tools to tackle this challenge. Lego-Edit incorporates two key designs: (1) a model-level toolkit comprising diverse models efficiently trained on limited data and several image manipulation functions, enabling fine-grained composition of editing actions by the MLLM; and (2) a three-stage progressive reinforcement learning approach that uses feedback on unannotated, open-domain instructions to train the MLLM, equipping it with generalized reasoning capabilities for handling real-world instructions. Experiments demonstrate that Lego-Edit achieves state-of-the-art performance on GEdit-Bench and ImgBench. It exhibits robust reasoning capabilities for open-domain instructions and can utilize newly introduced editing tools without additional fine-tuning. The figure below showcases Lego-Edit's qualitative performance.
22
+
23
+ <p align="center"><img src="https://github.com/xiaomi-research/lego-edit/raw/main/resources/case_pic.png" width="95%"></p>
24
+
25
+ ## ✨ Features
26
+
27
+ Lego-Edit supports local editing, global editing, and multi-step editing as demonstrated in our tests, with corresponding results shown above. We discuss its feedback responsiveness and tool-extension capabilities in our paper.
28
+
29
+ Additionally, Lego-Edit accepts mask inputs for precise editing region control. Example applications are provided here:
30
+
31
+ <p align="center"><img src="https://github.com/xiaomi-research/lego-edit/raw/main/resources/maskcase1.png" width="95%"></p>
32
+
33
+ <p align="center"><img src="https://github.com/xiaomi-research/lego-edit/raw/main/resources/maskcase2.png" width="95%"></p>
34
+
35
+ You can try it and find more usages of this framework.
36
+
37
+ ## πŸ”₯ Quick Start
38
+
39
+ For detailed instructions on setting up the environment, downloading checkpoints, and running the Gradio WebUI, please refer to the [Quick Start section in the GitHub repository](https://github.com/xiaomi-research/lego-edit#--quick-start).
40
+
41
+ ## πŸ’Ό New Tools Integration
42
+
43
+ Lego-Edit supports the integration of new tools. For guidance on how to add custom tools and make them usable by The Builder, please refer to the [New Tools Integration section in the GitHub repository](https://github.com/xiaomi-research/lego-edit#--new-tools-integration).
44
+
45
+ ## πŸ“ More Usages
46
+
47
+ Some editing models are trained at a resolution of 768 via the ICEdit method. The corresponding trained [Single-Task-LoRA](https://huggingface.co/xiaomi-research/lego-edit/tree/main/loras) are provided. For independent usage of these LoRAs, refer to the usage instructions at [ICEdit](https://github.com/River-Zhang/ICEdit).
48
+
49
+ <p align="center"><img src="https://github.com/xiaomi-research/lego-edit/raw/main/resources/lora_effect.png" width="95%"></p>
50
+
51
+ ## πŸ“„ Disclaimer
52
+
53
+ We open-source this project for academic research. The vast majority of images
54
+ used in this project are either generated or licensed. If you have any concerns,
55
+ please contact us, and we will promptly remove any inappropriate content.
56
+ Our code is released under the Apache 2.0 License, while our models are under
57
+ the CC BY-NC 4.0 License. Any models related to <a href="https://huggingface.co/black-forest-labs/FLUX.1-dev" target="_blank">FLUX.1-dev</a>
58
+ base model must adhere to the original licensing terms.
59
+ <br><br>This research aims to advance the field of generative AI. Users are free to
60
+ create images using this tool, provided they comply with local laws and exercise
61
+ responsible usage. The developers are not liable for any misuse of the tool by users.
62
+
63
+ ## ✍️ Citation
64
+
65
+ If you find this project useful for your research, please consider citing our paper:
66
+
67
+ ```bibtex
68
+ @article{jia2025legoedit,
69
+ title = {Lego-Edit: A General Image Editing Framework with Model-Level Bricks and MLLM Builder},
70
+ author = {Qifei Jia and Yu Liu and Yajie Chai and Xintong Yao and Qiming Lu and Yasen Zhang and Runyu Shi and Ying Huang and Guoquan Zhang},
71
+ journal = {arXiv preprint arXiv:2509.12883},
72
+ year = {2025},
73
+ url = {https://arxiv.org/abs/2509.12883}
74
+ }
75
+ ```
76
+
77
+ ## πŸ™ Acknowledgments
78
+
79
+ - Built on the [MiMo-VL](https://github.com/XiaomiMiMo/MiMo-VL), [ComfyUI](https://github.com/comfyanonymous/ComfyUI), [FLUX](https://github.com/black-forest-labs/flux), [ICEdit](https://github.com/River-Zhang/ICEdit), [EVF-SAM](https://github.com/hustvl/EVF-SAM) and [LaMa](https://github.com/advimman/lama)