OpenAI

OpenAI's voices, accessible via Azure, offer an enhanced audio experience. We are using OpenAI TTS through Azure in your geography for compliant usage. It is important to acknowledge existing constraints, notably the absence of Speech Synthesis Markup Language (SSML) support. Here are methods to effectively employ OpenAI's TTS features within these boundaries:

To activate OpenAI as your Text-to-Speech (TTS) provider, please get in touch with your Customer Success Manager.

List of Voices

Click here to see the list of available TTS voices that OpenAI provides.

Multilingual Capabilities

OpenAI's TTS extends multilingual support, automatically recognizing and speaking languages contextually without explicit commands. However, performance varies across languages, so it is crucial to test thoroughly in your target language. To ensure a prompt is spoken in the desired language, include a language-specific word. For example, instead of saying "2 3 4 5 6", say "Die Nummer lautet also 2 3 4 5 6, richtig?" to ensure correct language interpretation.

Constraints with SSML Tags

OpenAI's TTS does not support SSML tags, they will be ignored when choosing an OpenAI voice. You can still build your AI Agent with SSML tags that work with both Azure and OpenAI voices simultaneously.

Addressing Pronunciation Nuances

For precise pronunciation, such as with specialized terminology or brand names, assess how OpenAI processes your prompts and adjust accordingly.

Integrating OpenAI TTS

To integrate OpenAI's TTS capabilities:

  1. When creating or editing a project release, navigate to the Speaker Voice dropdown menu.

  2. From the available options, choose an OpenAI voice that suits your needs.

Date and Time Formatting

To achieve the most natural and accurate voice output from OpenAI TTS, it's crucial to format dates and times in a way that the system can easily recognize and correctly articulate. Below are recommended formats and examples to avoid:

Dates

Times

Properly formatting times is equally essential to ensure that OpenAI TTS can interpret and vocalize them accurately. Here are the effective formats alongside examples of what to avoid:

Prices and Currencies

Ensuring that prices and currencies are expressed in a format that OpenAI TTS can accurately interpret is crucial for clear communication. The following table outlines the recommended practices for formatting prices and currencies, as well as common pitfalls to avoid:

Price Formatting

Numbers and Alphanumerics

For numbers and alphanumeric sequences, transforming them into a format that OpenAI TTS processes without errors ensures accurate and complete voice output. Below are effective inputs alongside formats that may result in less accurate articulation:

Enhancing Intonation

Emotional Tone

OpenAI's voices may sound monotonous in certain contexts. Incorporating emotive language and enthusiastic expressions, such as 'Great!', 'Fantastic!', 'Klasse!', and 'Super!', can significantly improve the listener's engagement and the overall appeal of the speech output, infusing the bot’s vibe with a more enthusiastic energy.

Pronunciation and Pauses

Although SSML tags for structured pronunciations aren't supported, experimenting with punctuation or separators such as "--" may offer a workaround for inserting pauses. The effectiveness of these techniques varies, emphasizing the importance of testing in your specific use case.

Last updated