The text being spoken in the clips does not matter, but diverse text does seem to perform better.For example, if you want to hear your target voice read an audiobook, try to find clips of them reading a book. Try to find clips that are spoken in such a way as you wish your output to sound like.Avoid clips that have excessive stuttering, stammering or words like "uh" or "like" in them.These generally have distortion caused by the amplification system. Tortoise is unlikely to do well with them. These clips were removed from the training dataset. Avoid clips with background music, noise or reverb.Save the clips as a WAV file with floating point format and a 22,050 sample rate.Īs mentioned above, your reference clips have a profound impact on the output of Tortoise.More is better, but I only experimented with up to 5 in my testing. Cut your clips into ~10 second segments.Guidelines for good clips are in the next section. Good sources are YouTube interviews (you can use youtube-dl to fetch the audio), audiobooks or podcasts. Gather audio clips of your speaker(s).To add new voices to Tortoise, you will need to do the following: What Tortoise can do for zero-shot mimicking, take a look at the others. If your goal is high quality speech, I recommend you pick one of them. Voices prepended with "train_" came from the training set and performįar better than the others. This repo comes with several pre-packaged voices. The reference clip is also used to determine non-voice related aspects of the audio output like volume, background noise, recording quality and reverb. These clips are used to determine many properties of the output, such as the pitch and tone of the voice, speaking speed, and even speaking defects like a lisp or stuttering. These reference clips are recordings of a speaker that you provide to guide speech generation. It accomplishes this by consulting reference clips. Tortoise was specifically trained to be a multi-speaker model. tts_with_preset( "your text here", voice_samples = reference_clips, preset = 'fast') Voice customization guide TextToSpeech( use_deepspeed = True, kv_cache = True, half = True) run tortoise python setup install script.change the current directory to tortoise-tts.install pytorch with the command provided here:.create conda environment with minimal dependencies specified.Then run the following commands, using anaconda prompt as the terminal (or any other terminal configured to work with conda) Will spend a lot of time chasing dependency problems. I have been told that if you do not do this, you On Windows, I highly recommend using the Conda installation path. If you want to use this on your own computer, you must have an NVIDIA GPU. See this page for a large list of example outputs.Ĭool application of Tortoise+GPT-3 (not by me): Usage guide Local Installation On a K80, expect to generate a medium sized sentence every 2 minutes. It leverages both an autoregressive decoder and a diffusion decoder both known for their low Tortoise is a bit tongue in cheek: this model I'm naming my speech-related repos after Mojave desert flora and fauna.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |