📄️ Evaluations Overview
Think of evaluations as unit tests for your AI Agent. Just like software developers write tests to make sure their code works correctly, evaluations let you verify that your AI Agent responds accurately, uses the right tools, and stays aligned with your instructions.
📄️ Why Evaluations Matter
Building an AI Agent is just the first step. Keeping it accurate and reliable as you make changes is the real challenge. Evaluations give you the confidence to iterate quickly without breaking what already works.
📄️ Getting Started with Evaluations
This guide walks you through creating and running your first evaluation. By the end, you'll have a working eval case with metrics and understand how to interpret the results.
📄️ Eval Cases & Collections
This page covers the full configuration options for eval cases and how to use collections to organize and batch-run your evaluations.
📄️ AI-Generated Eval Cases
Instead of manually creating every eval case, you can let AI analyze your Agent's configuration and automatically generate a diverse set of test cases. This is a fast way to build initial coverage for your evaluations.
📄️ Understanding Results
After running an evaluation, each metric produces a score, a pass/fail status, and an explanation. This page helps you interpret those results and take action to improve your Agent's performance.
🗃️ Metrics
3 items