NLU Training Best Practices

Introduction

Understanding human language involves complex challenges that require sophisticated techniques and models. In this domain, two key concepts emerge: Natural Language Processing (NLP) and Natural Language Understanding (NLU). NLP involves techniques for breaking down natural language into components machines can learn from. In contrast, NLU focuses on interpreting the semantic meanings of these components.

At Parloa, our approach to NLU utilizes the RASA DIET classifier, a versatile model adept at both intent classification and entity extraction. This process includes training the model with various examples, such as distinguishing between "cats" and "ponies," to accurately map new inputs to the correct intents based on learned patterns.

Training Best Practices

For effective training of an NLU system, adherence to several best practices is crucial. These practices ensure the model learns accurately and remains adaptable.

Training Data: Relevance, Diversity, and Accuracy

For effective training of an NLU system, adhering to several best practices is crucial. These practices ensure the model learns accurately and remains adaptable.

Your training examples must be closely aligned with the real-world scenarios the AI is expected to handle. Ensuring the relevance of these examples is crucial for the AI to accurately recognize and act upon the intents you need it to understand.

Examples

  • "Book a flight to New York" directly connects to a common travel-related request.

  • "Schedule a meeting for next Wednesday" ties into typical calendar management tasks.

  • "Order a large pepperoni pizza from the nearest pizzeria" reflects a frequent food ordering scenario.

By focusing on relevance, diversity, and accuracy and providing clear, distinct examples for each, you ensure the AI is well-prepared to understand and act on the intents it will encounter in real-world scenarios.

Balancing the Training Data

Balance is essential in training data. Each request type or intent should be equally represented, with a minimum of 50 utterances per intent to ensure stability and prevent overfitting.

Avoiding Over Engineering and Over-Fitting

Achieving the right balance between system complexity and adaptability is key. Avoid overly detailed examples that may not generalize well to variations of a query. For example, a request like "Book a luxury suite with a panoramic view of the Eiffel Tower in a five-star hotel in Paris for three nights" could restrict the system's ability to understand related but slightly different queries.

Out-of-Scope vs. Fallback Intent

Implementing special responses for queries that fall outside the bot’s capabilities (Out-of-Scope) or are unclear (Fallback) improves user interactions by providing clear guidance when the bot cannot fulfill a request.

Maintaining and Updating Training Data

Regular Updates

Regularly update the training data with new phrases and expressions that reflect evolving language trends and adjust for specific intent changes. Simplifying training data to focus on the most relevant information facilitates effective learning, making the AI more adaptable.

Avoiding Overcomplicated Phrases

Ensure training examples are straightforward, focusing on the main information. This approach helps the system learn more effectively by reducing confusion.

Simplify Training Data

Simplifying user queries for training purposes, by focusing on the essential elements, enables the AI to learn more efficiently. For example, "Book a five-star hotel in Miami" is more effective for training than a complex sentence with multiple specifications.

Last updated