Configuration Parsing Warning: Invalid JSON for config file config.json

XTTS-v2

This repo contains the XTTS-v2 model checkpoint finetuned with PhoAudiobook dataset for Vietnamese. Details of the finetuning process and experimental results can be found in our ACL 2025 paper, "Zero-Shot Text-to-Speech for Vietnamese". If you use this model in your work, please cite the paper:

@inproceedings{vu2025zeroshottexttospeechvietnamese,
      title={Zero-Shot Text-to-Speech for Vietnamese}, 
      author={Thi Vu and Linh The Nguyen and Dat Quoc Nguyen},
      year={2025},
      booktitle={Proceedings of ACL},
}

How to run

# install coqui TTS
pip install TTS

# run inference
python infer.py \
--xtts_checkpoint best_model.pth \
--xtts_config config.json \
--xtts_vocab vocab.json \
--speaker_audio /path/to/your_ref.wav \
--lang vi \
--text "Nếu chỉ còn một ngày để sống tôi xin làm một bông hoa đẹp." \
--output output.wav

Downloads last month: 60

Model tree for thivux/XTTS-v2-vietnamse

Base model

coqui/XTTS-v2

Finetuned

(57)

this model

Dataset used to train thivux/XTTS-v2-vietnamse

Paper for thivux/XTTS-v2-vietnamse

Zero-Shot Text-to-Speech for Vietnamese

Paper • 2506.01322 • Published Jun 2, 2025 • 1