Skip to main content

Contextual Recall

RAG Metric

Introduction

The Contextual Recall metric measures whether all the necessary context was retrieved from your knowledge base to answer the user's question. While Contextual Precision checks for noise, Contextual Recall checks for completeness — did the retrieval system find everything it needed?


When to Use This Metric

  • You want to ensure your knowledge base retrieval doesn't miss important information.
  • You're testing whether your Agent has access to all the content needed to provide complete answers.
  • You're diagnosing cases where the Agent gives partial or incomplete responses.
  • You need to validate that newly added content is being properly indexed and retrieved.
  • You're comparing how different chunking strategies affect retrieval completeness.

Configuration

ParameterTypeDefaultRequiredDescription
thresholdfloat0.8NoScore threshold for passing (0.0–1.0).
strict_modebooleanfalseNoRounds score to 1.0 or 0.0 based on threshold.
info

This metric requires a trained knowledge base attached to your AI Agent. It also uses the expected AI response from the eval case as a reference for what a complete answer should contain.


How It Works

  1. The AI Agent receives the input message and retrieves context chunks from the knowledge base.
  2. The testing LLM compares the retrieved chunks against the expected response defined in the eval case.
  3. It determines whether the retrieved context contains all the information needed to produce the expected response.
  4. The score reflects how much of the necessary information was successfully retrieved.

Scoring

  • Range: 0.0 to 1.0 (higher is better).
  • High score (close to 1.0): All necessary information was retrieved from the knowledge base.
  • Low score (close to 0.0): Important information was missing from the retrieved context.
  • Pass condition: The score must be greater than or equal to the configured threshold.

Example

Input: "What payment methods do you accept?"

Expected response: "We accept Visa, Mastercard, American Express, PayPal, and bank transfers."

Retrieved chunks:

  1. "We accept major credit cards including Visa, Mastercard, and American Express."

Score: 0.60

Result: Failed (threshold: 0.8)

The retrieved context mentions credit cards but is missing information about PayPal and bank transfers, which are part of the expected complete answer.


Tips for Improving Scores

  • Add missing content to your knowledge base when recall scores are low — the reason text will help identify what's missing.
  • Make sure your expected AI response in the eval case accurately reflects what a complete answer looks like.
  • Check that documents are properly indexed after uploading — content that isn't processed won't be retrievable.
  • If information exists in your knowledge base but isn't being retrieved, consider adjusting your chunking strategy to keep related information together.
  • Use this metric alongside Contextual Precision for a complete view of your retrieval quality (precision measures noise, recall measures completeness).