OpenAI via Azure (Preview)
Last updated
Was this helpful?
Last updated
Was this helpful?
OpenAI's voices, accessible via Azure, offer an enhanced audio experience. We are using OpenAI TTS through Azure in your geography for compliant usage. It is important to acknowledge existing constraints, notably the absence of Speech Synthesis Markup Language () support.
Do note that the service is still in public preview by Microsoft Azure.
Here are methods to effectively employ OpenAI's TTS features within these boundaries:
To integrate OpenAI's TTS capabilities:
From the available options, choose an OpenAI voice that suits your needs.
To achieve the most natural and accurate voice output from OpenAI TTS, it's crucial to format dates and times in a way that the system can easily recognize and correctly articulate. Below are recommended formats and examples to avoid:
English - Effective
I can confirm the booking for May 25th 2024.
I can confirm the booking for May 25th 2024.
Written form of months and numbers with respective suffix.
English - Ineffective
I can confirm the booking for the 02.02.2024.
I can confirm the booking for the zo 2, oh 2, 2024.
Date not recognized.
German - Effective
Ich bestätige die Buchung für den 15ten Februar 2024.
Ich bestätige die Buchung für den 15ten Februar 2024.
Correct recognition and pronunciation of ordinal numbers and months in date format.
German -Ineffective
Ich bestätige die Buchung für den 15.2.2024.
Ich bestätige die Buchung für den 15, zwei, 24.
Date is not recognized and not pronounced correctly.
Properly formatting times is equally essential to ensure that OpenAI TTS can interpret and vocalize them accurately. Here are the effective formats alongside examples of what to avoid:
English - Effective
The bus arrives at 11:15 AM.
The bus arrives at 11:15 AM.
Time is correctly recognized and vocalized in the 12-hour format, which is standard in English.
English - Ineffective
The bus arrives at 17:00.
The bus arrives at 17.
Misinterpretation of format.
German - Effective
Der Flug geht um 17 Uhr.
Der Flug geht um 17 Uhr.
Time is correctly recognized and vocalized in the 24-hour format which is standard in German.
German - Ineffective
Der Flug geht um 17:00 Uhr.
Der Flug geht um 17
"Uhr" will be ignored.
Ensuring that prices and currencies are expressed in a format that OpenAI TTS can accurately interpret is crucial for clear communication. The following table outlines the recommended practices for formatting prices and currencies, as well as common pitfalls to avoid:
English - Effective
It costs thirteen Euro and forty-five cents.
It costs 13 Euro and 45 cents.
Price is correctly recognized and vocalized.
English - Ineffective
It costs 13.45€.
It costs 13.45.
German - Effective
Es kostet dreizehn Euro und fĂĽnfundvierzig Cent.
Es kostet 13 Euro und 45 Cent.
Price is correctly recognized and vocalized in German.
German - Ineffective
Es kostet 13,45€.
Es kostet deiteenand for firth eurs.
OpenAI currently cannot deal with € sign following price.
For numbers and alphanumeric sequences, transforming them into a format that OpenAI TTS processes without errors ensures accurate and complete voice output. Below are effective inputs alongside formats that may result in less accurate articulation:
Effective
The confirmation number is one two three four five six seven eight.
The confirmation number is 12345678.
Sequence of numbers is articulated clearly and accurately as individual digits.
Ineffective
The confirmation number is 12345678.
The confirmation number is 12567.
Single numbers are swallowed when using pure number format.
OpenAI's voices may sound monotonous in certain contexts. Incorporating emotive language and enthusiastic expressions, such as 'Great!', 'Fantastic!', 'Klasse!', and 'Super!', can significantly improve the listener's engagement and the overall appeal of the speech output, infusing the bot’s vibe with a more enthusiastic energy.
Although SSML tags for structured pronunciations aren't supported, experimenting with punctuation or separators such as "--" may offer a workaround for inserting pauses. The effectiveness of these techniques varies, emphasizing the importance of testing in your specific use case.
When or a project release, navigate to the Speaker Voice dropdown menu.