Skip to main content

Metrics Overview

Metrics are the measurements you apply to your AI Agent's responses during an evaluation. Each metric evaluates a specific aspect of quality, producing a score between 0.0 and 1.0.

Cubeo AI provides ten metrics organized into two categories: Generic Metrics and RAG Metrics.


Generic Metrics

These metrics work with any AI Agent, regardless of whether it uses a knowledge base.

MetricWhat It MeasuresKnowledge Base Required
Answer RelevancyHow relevant the response is to the user's questionNo
Task CompletionWhether the Agent completed a specific taskNo
Tool CorrectnessWhether the Agent used the expected toolsNo
Prompt AlignmentHow well the response follows the system promptNo
Pattern MatchWhether the response matches a regex patternNo

RAG Metrics

These metrics are designed for Agents that use a knowledge base (Retrieval-Augmented Generation). They evaluate how well the Agent retrieves and uses information from your documents.

info

All RAG metrics require a trained knowledge base attached to your AI Agent. Without one, these metrics cannot retrieve context and will not produce meaningful results.

MetricWhat It MeasuresKnowledge Base Required
HallucinationWhether the Agent fabricated information not in the contextYes
FaithfulnessHow grounded the response is in the retrieved contextYes
Contextual PrecisionHow much of the retrieved context is relevantYes
Contextual RecallWhether all necessary context was retrievedYes
Contextual RelevancyHow relevant the retrieved context is to the queryYes

Common Configuration Parameters

All metrics share these base configuration parameters:

ParameterTypeDefaultDescription
thresholdfloat0.8Score threshold for passing (0.0–1.0). The metric passes if the score meets or exceeds this value.
strict_modebooleanfalseWhen enabled, rounds the score to 1.0 (pass) or 0.0 (fail) based on the threshold.

Some metrics have additional parameters specific to their evaluation logic. See each metric's page for the full configuration.


Choosing the Right Metrics

Use this guide to pick the metrics that match your use case:

  • Any Agent — Start with Answer Relevancy and Prompt Alignment. These are the most broadly useful metrics.
  • Agents with tools — Add Tool Correctness to verify the right tools are being used, and Task Completion to confirm tasks are fully completed.
  • Agents with a knowledge base — Add Hallucination and Faithfulness to catch fabricated information and ensure responses are grounded in your documents.
  • Fine-tuning retrieval quality — Use Contextual Precision, Contextual Recall, and Contextual Relevancy to diagnose issues with your knowledge base retrieval.
  • Format validation — Use Pattern Match when your Agent's response must follow a specific structure, contain certain keywords, or match a defined format.

Next Steps

Explore each metric in detail:

  • Generic Metrics — Answer Relevancy, Task Completion, Tool Correctness, Prompt Alignment, Pattern Match
  • RAG Metrics — Hallucination, Faithfulness, Contextual Precision, Contextual Recall, Contextual Relevancy