Generating Realistic Voices
Date:
- Constructed a deep neural network-based model for language processing and regeneration that made AI-based voices indistinguishable from natural speech by capturing the distinct voice feature to perform TTS on constrained inputs.
- Optimized language processing model with MelNet and WaveNet to retain linguistic information of multiple speakers.
- Achieved accuracy of 86.2% using sequence-to-sequence recurrent network (Tacotron 2) along with modified vocoders.