Skip to main content

Faithfulness

RAG Metric

Introduction

The Faithfulness metric measures how well your AI Agent's response is grounded in the retrieved knowledge base context. While Hallucination detects fabricated information, Faithfulness focuses on the positive side — how much of the response is actually supported by the source material.


When to Use This Metric

  • You want to measure how well the Agent stays grounded in your documents.
  • You need to validate that the Agent's answers are traceable back to specific content in your knowledge base.
  • You're comparing how different LLM models handle source material (some models are more faithful than others).
  • You want to complement the Hallucination metric with a positive grounding measure.
  • You're testing whether system prompt instructions about citing sources are working.

Configuration

ParameterTypeDefaultRequiredDescription
thresholdfloat0.8NoScore threshold for passing (0.0–1.0).
strict_modebooleanfalseNoRounds score to 1.0 or 0.0 based on threshold.
info

This metric requires a trained knowledge base attached to your AI Agent.


How It Works

  1. The AI Agent receives the input message, retrieves context from the knowledge base, and generates a response.
  2. The testing LLM extracts individual claims from the Agent's response.
  3. Each claim is checked against the retrieved knowledge base context to determine if it's supported.
  4. The score reflects the proportion of claims that are faithfully grounded in the context.

Scoring

  • Range: 0.0 to 1.0 (higher is better).
  • High score (close to 1.0): The response is well-grounded in the knowledge base context.
  • Low score (close to 0.0): The response diverges from or ignores the available context.
  • Pass condition: The score must be greater than or equal to the configured threshold.

Faithfulness vs. Hallucination

These two metrics are complementary but measure different things:

  • Hallucination detects fabricated information (things the Agent made up). Lower is better.
  • Faithfulness measures grounding (how much the response is backed by context). Higher is better.

A response can score well on Hallucination (no fabrications) but poorly on Faithfulness (the Agent answered from general knowledge instead of using the retrieved context). Using both metrics together gives you a complete picture of how well your Agent uses its knowledge base.


Example

Input: "How do I reset my password?"

Knowledge base context: "To reset your password, go to the login page and click 'Forgot Password'. Enter your email address and you'll receive a reset link within 5 minutes."

AI Response: "To reset your password, navigate to the login page and click on 'Forgot Password'. Enter the email address associated with your account, and you'll receive a password reset link within 5 minutes."

Score: 0.98

Result: Passed (threshold: 0.8)

The response closely follows the knowledge base content, with all claims traceable back to the source material.


Tips for Improving Scores

  • Add instructions in your system prompt to prioritize knowledge base content over general knowledge.
  • Ensure your knowledge base content is well-written and comprehensive — the Agent can only be faithful to what's available.
  • If the Agent provides accurate but unsourced information, consider adding that content to your knowledge base.
  • Review low-scoring cases to identify where the Agent is supplementing context with its own knowledge.
  • Use this metric alongside Hallucination for a complete picture of your Agent's grounding quality.