Getting Started with Evaluations
This guide walks you through creating and running your first evaluation. By the end, you'll have a working eval case with metrics and understand how to interpret the results.
Prerequisites
Before creating evaluations, make sure you have:
- An AI Agent already created and configured.
- For RAG metrics (Hallucination, Faithfulness, etc.), a trained knowledge base attached to your Agent.
Creating Your First Eval Case
- Open your AI Agent and navigate to the Evals tab.
- Click Create Eval Case.
- Give your eval case a name (e.g., "Greeting test") and a description explaining what it tests.
- In the Messages section, add a Human message — this is what will be sent to your Agent (e.g., "Hello, what can you help me with?").
- Optionally, add an AI message — this represents the expected response your Agent should produce. Some metrics use this as a reference for comparison.
- In the Metrics section, click Add Metric and select a metric type (start with Answer Relevancy for a simple first test).
- Configure the metric parameters (the defaults work well for most cases).
- Click Save.
Running Your First Eval
- From the Evals tab, find the eval case you just created.
- Click the Run button.
- The status will change to In Progress as the system sends your messages to the Agent and evaluates the response.
- Once complete, the status will update to Passed, Failed, or Warning.
Understanding the Results
After the run completes, click on the eval case to view the results:
- Score — A value between 0.0 and 1.0 for each metric. Higher is better (except for the Hallucination metric, where lower is better).
- Status — Whether each metric passed or failed based on its configured threshold.
- Reason — A text explanation from the testing LLM describing why the score was given.
For a deeper dive into interpreting results, see Understanding Results.
Next Steps
- Eval Cases & Collections — Learn about all the configuration options available for eval cases.
- AI-Generated Eval Cases — Let AI create eval cases for you automatically.
- Metrics Overview — Explore all ten available metrics.