Text-to-Image
Diffusers
Safetensors
English
ZImagePipeline

Why use an LLM as the text encoder instead of a VLM?

#65
by bengen - opened

Why use an LLM as the text encoder instead of a VLM? Since a VLM aligns text and images, shouldn't it achieve better performance for image editing and text rendering? What was your reasoning behind this choice?

Sign up or log in to comment