Generating Realistic Voices

Date: July 01, 2019

Constructed a deep neural network-based model for language processing and regeneration that made AI-based voices indistinguishable from natural speech by capturing the distinct voice feature to perform TTS on constrained inputs.
Optimized language processing model with MelNet and WaveNet to retain linguistic information of multiple speakers.
Achieved accuracy of 86.2% using sequence-to-sequence recurrent network (Tacotron 2) along with modified vocoders.