TTS x Hallo Talking Portrait Generator
This demo allows you to generate a talking portrait with the help of several open-source projects: SDXL Lightning | Parler TTS | WhisperSpeech | Hallo
To let the community try and enjoy this demo, video length is limited to 4 seconds audio maximum.
Duplicate this space to skip the queue and get unlimited video duration. 4-5 seconds of audio will take ~5 minutes per inference, please be patient.
1. Load Portrait
2. Load Voice
3. Result
Hallo Pro Tips:
Hallo has a few simple requirements for input data:
For the source image:
- It should be cropped into squares.
- The face should be the main focus, making up 50%-70% of the image.
- The face should be facing forward, with a rotation angle of less than 30° (no side profiles).
For the driving audio:
- It must be in WAV format.
- It must be in English since our training datasets are only in this language.
- Ensure the vocals are clear; background music is acceptable.
TTS Pro Tips:
For Parler TTS:
- Include the term "very clear audio" to generate the highest quality audio, and "very noisy audio" for high levels of background noise
- Punctuation can be used to control the prosody of the generations, e.g. use commas to add small breaks in speech
- The remaining speech features (gender, speaking rate, pitch and reverberation) can be controlled directly through the prompt
For WhisperSpeech:
WhisperSpeech is able to quickly clone a voice from an audio sample.
- Upload a voice sample in the WhisperSpeech tab
- Add text to synthetize, hit Generate voice clone button