Paralinguistic tags
Hello , first of all thanks for this outstanding German preview !
I am currently learning to finetune models. I wonder if tags like (laugh) (giggle) etc are still available after finetuning (from the base model) ? Or do I have to "train" these tags too ?
What is your experiance regarding fintuneing ? Any tips would be great. BR John .
I'm wondering where to find a tutorial on fine-tuning this model.
Thank you for your interest. Expressing emotions using tags is not supported by either the trained model or the base model. This is because the system reads pronunciations based on a phonemizer, and if tags are added, they would also be interpreted by the phonemizer as text to be pronounced. TTherefore, this functionality is not supported by either the base model or the trained model.
If you would like to include emotions using tags, we recommend training a new model. The approach I suggest is to start from a base model that natively supports tags. For this reason, I believe FunAudioLLM/Fun-CosyVoice3-0.5B-2512 is a better choice than neuphonic/neutts-air.
This might be TMI, but if you decide to proceed with training, I expect that obtaining suitable data will be one of the main challenges. I haven’t tried this myself yet, so I can’t guarantee success. However, while considering tag-based training, I have been thinking about two possible directions:
Training neuphonic/neutts-air with three types of data simultaneously: English with tags, English without tags, and another language without tags.
Training FunAudioLLM/Fun-CosyVoice3-0.5B-2512 on another language without tags, while limiting the training to a point where tag recognition performance is not significantly degraded.
Hello again, thank you for your detailed explanation. That helps me a lot.