Microsoft is announcing limited access to the AI ​​Neural Text-to-Speech AI

Microsoft recently announced limited access to the neural text-to-speech AI called Custom Neural Voice. The service allows developers to create custom synthetic voices.

The Custom Neural Voice is a Text-to-Speech (TTS) feature of Speech in Azure Cognitive Services that allows users to create one-of-a-kind synthetic voice for their brand. Since last year’s preview in September, the feature has helped several customers such as AT&T, Duolingo, Progressive, and Swisscom to develop branded speech solutions for their customers. The feature is universally available (GA), but customer access to Custom Neural Voice includes technical controls to prevent misuse of the service – they must apply for it.

Microsoft’s core Neural TTS technology for Custom Neural Voice has three main components: Text Analyzer, Neural Acoustic Model, and Neural Vocoder. The first part, Text Analyzer, is responsible for generating natural, synthetic language from text. The text is first typed into a Text Analyzer, which provides a telephone format (a basic unit of sound that distinguishes one word from another word in a particular language). Next, the phone series explains the pronunciation of the words given in the text, which goes into the Neural Acoustic Module to predict the acoustic features that interpret speech features, such as timbre, speech style, pace, sounds and emphasis patterns. And finally, the Neural Vocoder converts the acoustic properties into audible waves to generate synthetic speech.

Tural Neural voice models are trained using deep neural networks based on real voice recording samples. With Custom Neural Voice customization capability, customers can modify the Neural TTS engine to better their user settings. To accelerate normal neural voice, customers need an Azure account and membership. After that, after agreeing to use the feature, they can start a custom voice project, upload data, train, test, and use the voice module.


A number of usage cases enable customers to benefit from Custom Neural Voice, such as messenger service chat, voice assistants, online learning, audiobooks, public service notifications, and real-time translations. One earlier adopter,, wanted to create more engaging messenger experiences by building a voice assistant that specifically represents its brand. In a Microsoft Swiss news release, the author wrote:

Using the speech service, Swisscom has provided their customers with an intelligent, multilingual voice assistant, helping to improve customer experience and accelerate the process. digital change itself.

Qinying Liao, Microsoft ‘s chief program manager, described in an Azure AI blog post the benefits of benefiting from Custom Neural Voice:

Powered by this technology, Custom Neural Voice enables users to pick up real-time voices with just a small number of training boards. This new technology allows companies to use one-tenth of the effort traditionally required to prepare training data while at the same time significantly increasing the naturalness of synthetic speech production compared to traditional training methods.

In addition, Holger Mueller, principal analyst and vice president at Constellation Research Inc., told InfoQ:

To make computers more human, speech is an important part, and in 2020 enterprises must move away from the artificial and ordinary voices, synthetic speech accents of the past. The cloud enables this level of personal creation of a personal voice experience – with access, cheap computing, and operational capability. It is therefore a widespread use issue across IaaS / PaaS players – and suitable for enterprises and their customers, and even employees as they gain more human experience.

Finally, in addition to the ability to customize TTS voice modules, Microsoft offers more than 200 cloud and custom voices covering 54 languages ​​and regions.