Why use an LLM as the text encoder instead of a VLM?

#65

by bengen - opened 3 days ago

3 days ago

Why use an LLM as the text encoder instead of a VLM? Since a VLM aligns text and images, shouldn't it achieve better performance for image editing and text rendering? What was your reasoning behind this choice?

qpqpqpqpqpqp

3 days ago

https://huggingface.co/Tongyi-MAI/Z-Image-Turbo/discussions/4#6927ff862ad73944d0cbb300

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment