OpenAI

OpenAI's voices, accessible via Azure, offer an enhanced audio experience. We are using OpenAI TTS through Azure in your geography for compliant usage. It is important to acknowledge existing constraints, notably the absence of Speech Synthesis Markup Language (SSML) support. Here are methods to effectively employ OpenAI's TTS features within these boundaries:

To activate OpenAI as your Text-to-Speech (TTS) provider, please get in touch with your Customer Success Manager.

List of Voices

Click here to see the list of available TTS voices that OpenAI provides.

Multilingual Capabilities

OpenAI's TTS extends multilingual support, automatically recognizing and speaking languages contextually without explicit commands. However, performance varies across languages, so it is crucial to test thoroughly in your target language. To ensure a prompt is spoken in the desired language, include a language-specific word. For example, instead of saying "2 3 4 5 6", say "Die Nummer lautet also 2 3 4 5 6, richtig?" to ensure correct language interpretation.

Constraints with SSML Tags

OpenAI's TTS does not support SSML tags, they will be ignored when choosing an OpenAI voice. You can still build your AI Agent with SSML tags that work with both Azure and OpenAI voices simultaneously.

Addressing Pronunciation Nuances

For precise pronunciation, such as with specialized terminology or brand names, assess how OpenAI processes your prompts and adjust accordingly.

Integrating OpenAI TTS

To integrate OpenAI's TTS capabilities:

  1. When creating or editing a project release, navigate to the Speaker Voice dropdown menu.

  2. From the available options, choose an OpenAI voice that suits your needs.

Date and Time Formatting

To achieve the most natural and accurate voice output from OpenAI TTS, it's crucial to format dates and times in a way that the system can easily recognize and correctly articulate. Below are recommended formats and examples to avoid:

Dates

ContextInput to OpenAIVoice OutputResult

English - Effective

I can confirm the booking for May 25th 2024.

I can confirm the booking for May 25th 2024.

Written form of months and numbers with respective suffix.

English - Ineffective

I can confirm the booking for the 02.02.2024.

I can confirm the booking for the zo 2, oh 2, 2024.

Date not recognized.

German - Effective

Ich bestätige die Buchung für den 15ten Februar 2024.

Ich bestätige die Buchung für den 15ten Februar 2024.

Correct recognition and pronunciation of ordinal numbers and months in date format.

German -Ineffective

Ich bestätige die Buchung für den 15.2.2024.

Ich bestätige die Buchung für den 15, zwei, 24.

Date is not recognized and not pronounced correctly.

Times

Properly formatting times is equally essential to ensure that OpenAI TTS can interpret and vocalize them accurately. Here are the effective formats alongside examples of what to avoid:

LanguageInput to OpenAIVoice OutputResult

English - Effective

The bus arrives at 11:15 AM.

The bus arrives at 11:15 AM.

Time is correctly recognized and vocalized in the 12-hour format, which is standard in English.

English - Ineffective

The bus arrives at 17:00.

The bus arrives at 17.

Misinterpretation of format.

German - Effective

Der Flug geht um 17 Uhr.

Der Flug geht um 17 Uhr.

Time is correctly recognized and vocalized in the 24-hour format which is standard in German.

German - Ineffective

Der Flug geht um 17:00 Uhr.

Der Flug geht um 17

"Uhr" will be ignored.

Prices and Currencies

Ensuring that prices and currencies are expressed in a format that OpenAI TTS can accurately interpret is crucial for clear communication. The following table outlines the recommended practices for formatting prices and currencies, as well as common pitfalls to avoid:

Price Formatting

LanguageInput to OpenAIVoice OutputResult

English - Effective

It costs thirteen Euro and forty-five cents.

It costs 13 Euro and 45 cents.

Price is correctly recognized and vocalized.

English - Ineffective

It costs 13.45€.

It costs 13.45.

German - Effective

Es kostet dreizehn Euro und fünfundvierzig Cent.

Es kostet 13 Euro und 45 Cent.

Price is correctly recognized and vocalized in German.

German - Ineffective

Es kostet 13,45€.

Es kostet deiteenand for firth eurs.

OpenAI currently cannot deal with € sign following price.

Numbers and Alphanumerics

For numbers and alphanumeric sequences, transforming them into a format that OpenAI TTS processes without errors ensures accurate and complete voice output. Below are effective inputs alongside formats that may result in less accurate articulation:

ContextInput to OpenAIVoice OutputExplanation

Effective

The confirmation number is one two three four five six seven eight.

The confirmation number is 12345678.

Sequence of numbers is articulated clearly and accurately as individual digits.

Ineffective

The confirmation number is 12345678.

The confirmation number is 12567.

Single numbers are swallowed when using pure number format.

Enhancing Intonation

Emotional Tone

OpenAI's voices may sound monotonous in certain contexts. Incorporating emotive language and enthusiastic expressions, such as 'Great!', 'Fantastic!', 'Klasse!', and 'Super!', can significantly improve the listener's engagement and the overall appeal of the speech output, infusing the bot’s vibe with a more enthusiastic energy.

Pronunciation and Pauses

Although SSML tags for structured pronunciations aren't supported, experimenting with punctuation or separators such as "--" may offer a workaround for inserting pauses. The effectiveness of these techniques varies, emphasizing the importance of testing in your specific use case.

Last updated