Metrics Overview

Metrics are the measurements you apply to your AI Agent's responses during an evaluation. Each metric evaluates a specific aspect of quality, producing a score between 0.0 and 1.0.

Cubeo AI provides ten metrics organized into two categories: Generic Metrics and RAG Metrics.

Generic Metrics

These metrics work with any AI Agent, regardless of whether it uses a knowledge base.

Metric	What It Measures	Knowledge Base Required
Answer Relevancy	How relevant the response is to the user's question	No
Task Completion	Whether the Agent completed a specific task	No
Tool Correctness	Whether the Agent used the expected tools	No
Prompt Alignment	How well the response follows the system prompt	No
Pattern Match	Whether the response matches a regex pattern	No

RAG Metrics

These metrics are designed for Agents that use a knowledge base (Retrieval-Augmented Generation). They evaluate how well the Agent retrieves and uses information from your documents.

info

All RAG metrics require a trained knowledge base attached to your AI Agent. Without one, these metrics cannot retrieve context and will not produce meaningful results.

Metric	What It Measures	Knowledge Base Required
Hallucination	Whether the Agent fabricated information not in the context	Yes
Faithfulness	How grounded the response is in the retrieved context	Yes
Contextual Precision	How much of the retrieved context is relevant	Yes
Contextual Recall	Whether all necessary context was retrieved	Yes
Contextual Relevancy	How relevant the retrieved context is to the query	Yes

Common Configuration Parameters

All metrics share these base configuration parameters:

Parameter	Type	Default	Description
`threshold`	float	0.8	Score threshold for passing (0.0–1.0). The metric passes if the score meets or exceeds this value.
`strict_mode`	boolean	false	When enabled, rounds the score to 1.0 (pass) or 0.0 (fail) based on the threshold.

Some metrics have additional parameters specific to their evaluation logic. See each metric's page for the full configuration.

Choosing the Right Metrics

Use this guide to pick the metrics that match your use case:

Any Agent — Start with Answer Relevancy and Prompt Alignment. These are the most broadly useful metrics.
Agents with tools — Add Tool Correctness to verify the right tools are being used, and Task Completion to confirm tasks are fully completed.
Agents with a knowledge base — Add Hallucination and Faithfulness to catch fabricated information and ensure responses are grounded in your documents.
Fine-tuning retrieval quality — Use Contextual Precision, Contextual Recall, and Contextual Relevancy to diagnose issues with your knowledge base retrieval.
Format validation — Use Pattern Match when your Agent's response must follow a specific structure, contain certain keywords, or match a defined format.

Next Steps

Explore each metric in detail:

Generic Metrics — Answer Relevancy, Task Completion, Tool Correctness, Prompt Alignment, Pattern Match
RAG Metrics — Hallucination, Faithfulness, Contextual Precision, Contextual Recall, Contextual Relevancy

Metrics Overview

Generic Metrics​

RAG Metrics​

Common Configuration Parameters​

Choosing the Right Metrics​

Next Steps​

Generic Metrics

RAG Metrics

Common Configuration Parameters

Choosing the Right Metrics

Next Steps