Configuration Parsing Warning: Invalid JSON for config file config.json

XTTS-v2

This repo contains the XTTS-v2 model checkpoint finetuned with PhoAudiobook dataset for Vietnamese. Details of the finetuning process and experimental results can be found in our ACL 2025 paper, "Zero-Shot Text-to-Speech for Vietnamese". If you use this model in your work, please cite the paper:

@inproceedings{vu2025zeroshottexttospeechvietnamese,
      title={Zero-Shot Text-to-Speech for Vietnamese}, 
      author={Thi Vu and Linh The Nguyen and Dat Quoc Nguyen},
      year={2025},
      booktitle={Proceedings of ACL},
}

How to run

# install coqui TTS
pip install TTS

# run inference
python infer.py \
--xtts_checkpoint best_model.pth \
--xtts_config config.json \
--xtts_vocab vocab.json \
--speaker_audio /path/to/your_ref.wav \
--lang vi \
--text "Nếu chỉ còn một ngày để sống tôi xin làm một bông hoa đẹp." \
--output output.wav
Downloads last month
33
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for thivux/XTTS-v2-vietnamse

Base model

coqui/XTTS-v2
Finetuned
(54)
this model

Dataset used to train thivux/XTTS-v2-vietnamse