Metrics Overview
Metrics are the measurements you apply to your AI Agent's responses during an evaluation. Each metric evaluates a specific aspect of quality, producing a score between 0.0 and 1.0.
Cubeo AI provides ten metrics organized into two categories: Generic Metrics and RAG Metrics.
Generic Metrics
These metrics work with any AI Agent, regardless of whether it uses a knowledge base.
| Metric | What It Measures | Knowledge Base Required |
|---|---|---|
| Answer Relevancy | How relevant the response is to the user's question | No |
| Task Completion | Whether the Agent completed a specific task | No |
| Tool Correctness | Whether the Agent used the expected tools | No |
| Prompt Alignment | How well the response follows the system prompt | No |
| Pattern Match | Whether the response matches a regex pattern | No |
RAG Metrics
These metrics are designed for Agents that use a knowledge base (Retrieval-Augmented Generation). They evaluate how well the Agent retrieves and uses information from your documents.
All RAG metrics require a trained knowledge base attached to your AI Agent. Without one, these metrics cannot retrieve context and will not produce meaningful results.
| Metric | What It Measures | Knowledge Base Required |
|---|---|---|
| Hallucination | Whether the Agent fabricated information not in the context | Yes |
| Faithfulness | How grounded the response is in the retrieved context | Yes |
| Contextual Precision | How much of the retrieved context is relevant | Yes |
| Contextual Recall | Whether all necessary context was retrieved | Yes |
| Contextual Relevancy | How relevant the retrieved context is to the query | Yes |
Common Configuration Parameters
All metrics share these base configuration parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
threshold | float | 0.8 | Score threshold for passing (0.0–1.0). The metric passes if the score meets or exceeds this value. |
strict_mode | boolean | false | When enabled, rounds the score to 1.0 (pass) or 0.0 (fail) based on the threshold. |
Some metrics have additional parameters specific to their evaluation logic. See each metric's page for the full configuration.
Choosing the Right Metrics
Use this guide to pick the metrics that match your use case:
- Any Agent — Start with Answer Relevancy and Prompt Alignment. These are the most broadly useful metrics.
- Agents with tools — Add Tool Correctness to verify the right tools are being used, and Task Completion to confirm tasks are fully completed.
- Agents with a knowledge base — Add Hallucination and Faithfulness to catch fabricated information and ensure responses are grounded in your documents.
- Fine-tuning retrieval quality — Use Contextual Precision, Contextual Recall, and Contextual Relevancy to diagnose issues with your knowledge base retrieval.
- Format validation — Use Pattern Match when your Agent's response must follow a specific structure, contain certain keywords, or match a defined format.
Next Steps
Explore each metric in detail:
- Generic Metrics — Answer Relevancy, Task Completion, Tool Correctness, Prompt Alignment, Pattern Match
- RAG Metrics — Hallucination, Faithfulness, Contextual Precision, Contextual Recall, Contextual Relevancy