Skip to main content

Contextual Precision

RAG Metric

Introduction

The Contextual Precision metric evaluates the precision of your knowledge base retrieval system. It measures how much of the retrieved context is actually relevant to answering the user's question — essentially checking whether the retrieved chunks contain useful information or unnecessary noise.


When to Use This Metric

  • You want to verify that your knowledge base retrieval returns focused, relevant chunks.
  • You're diagnosing issues where your Agent's responses include irrelevant information.
  • You're optimizing your knowledge base chunking strategy and need to measure its impact.
  • You want to identify whether your knowledge base contains too much overlapping or redundant content.
  • You're testing whether changes to your knowledge base structure improve retrieval quality.

Configuration

ParameterTypeDefaultRequiredDescription
thresholdfloat0.8NoScore threshold for passing (0.0–1.0).
strict_modebooleanfalseNoRounds score to 1.0 or 0.0 based on threshold.
info

This metric requires a trained knowledge base attached to your AI Agent.


How It Works

  1. The AI Agent receives the input message and retrieves context chunks from the knowledge base.
  2. The testing LLM evaluates each retrieved chunk for its relevance to answering the question.
  3. The ratio of relevant chunks to total retrieved chunks determines the precision score.
  4. A higher ratio means the retrieval system is returning more useful and less noisy content.

Scoring

  • Range: 0.0 to 1.0 (higher is better).
  • High score (close to 1.0): Most or all retrieved chunks are relevant to the query.
  • Low score (close to 0.0): Many retrieved chunks are irrelevant or contain noise.
  • Pass condition: The score must be greater than or equal to the configured threshold.

Example

Input: "What is your refund policy?"

Retrieved chunks:

  1. "Refunds are processed within 5-7 business days after we receive the returned item." (relevant)
  2. "Our company was founded in 2015 and is headquartered in San Francisco." (not relevant)
  3. "Items can be returned within 30 days. Refunds are issued to the original payment method." (relevant)

Score: 0.67

Result: Failed (threshold: 0.8)

Two of the three retrieved chunks are relevant, but one is completely off-topic, lowering the precision score.


Tips for Improving Scores

  • Review your knowledge base for documents that contain mixed topics — consider splitting them into more focused chunks.
  • Remove outdated or duplicate content that may dilute search results.
  • Improve document titles and headings to help the retrieval system find the right content.
  • If unrelated chunks keep appearing, check whether your chunking size is too large (large chunks may contain a mix of relevant and irrelevant content).
  • Organize your knowledge base by topic so that chunks are semantically focused.