Skip to main content

Getting Started with Evaluations

This guide walks you through creating and running your first evaluation. By the end, you'll have a working eval case with metrics and understand how to interpret the results.


Prerequisites

Before creating evaluations, make sure you have:

  • An AI Agent already created and configured.
  • For RAG metrics (Hallucination, Faithfulness, etc.), a trained knowledge base attached to your Agent.

Creating Your First Eval Case

  1. Open your AI Agent and navigate to the Evals tab.
  2. Click Create Eval Case.
  3. Give your eval case a name (e.g., "Greeting test") and a description explaining what it tests.
  4. In the Messages section, add a Human message — this is what will be sent to your Agent (e.g., "Hello, what can you help me with?").
  5. Optionally, add an AI message — this represents the expected response your Agent should produce. Some metrics use this as a reference for comparison.
  6. In the Metrics section, click Add Metric and select a metric type (start with Answer Relevancy for a simple first test).
  7. Configure the metric parameters (the defaults work well for most cases).
  8. Click Save.

Running Your First Eval

  1. From the Evals tab, find the eval case you just created.
  2. Click the Run button.
  3. The status will change to In Progress as the system sends your messages to the Agent and evaluates the response.
  4. Once complete, the status will update to Passed, Failed, or Warning.

Understanding the Results

After the run completes, click on the eval case to view the results:

  • Score — A value between 0.0 and 1.0 for each metric. Higher is better (except for the Hallucination metric, where lower is better).
  • Status — Whether each metric passed or failed based on its configured threshold.
  • Reason — A text explanation from the testing LLM describing why the score was given.

For a deeper dive into interpreting results, see Understanding Results.


Next Steps