Why use an LLM as the text encoder instead of a VLM?
#65
by
bengen
- opened
Why use an LLM as the text encoder instead of a VLM? Since a VLM aligns text and images, shouldn't it achieve better performance for image editing and text rendering? What was your reasoning behind this choice?