Contextual Relevancy
RAG Metric
Introduction
The Contextual Relevancy metric assesses how semantically relevant the retrieved knowledge base context is to the user's query. It evaluates the overall quality of the match between what was asked and what was retrieved, giving you insight into how well your retrieval system understands user intent.
When to Use This Metric
- You want to measure the semantic quality of your knowledge base retrieval.
- You're testing whether user queries are being matched to the right content areas.
- You're diagnosing cases where the Agent retrieves tangentially related but not directly useful content.
- You need to evaluate how well your knowledge base content is structured for searchability.
- You're comparing retrieval quality before and after knowledge base reorganization.
Configuration
| Parameter | Type | Default | Required | Description |
|---|---|---|---|---|
threshold | float | 0.8 | No | Score threshold for passing (0.0–1.0). |
strict_mode | boolean | false | No | Rounds score to 1.0 or 0.0 based on threshold. |
This metric requires a trained knowledge base attached to your AI Agent.
How It Works
- The AI Agent receives the input message and retrieves context chunks from the knowledge base.
- The testing LLM evaluates how semantically relevant each retrieved chunk is to the original input query.
- The overall quality and relevance of the retrieval is assessed.
- A score is produced reflecting how well the retrieved context matches the user's intent.
Scoring
- Range: 0.0 to 1.0 (higher is better).
- High score (close to 1.0): The retrieved context is highly relevant to the user's query.
- Low score (close to 0.0): The retrieved context has little relevance to what was asked.
- Pass condition: The score must be greater than or equal to the configured
threshold.
Contextual Relevancy vs. Contextual Precision
These metrics are related but focus on different aspects:
- Contextual Precision measures the ratio of relevant to irrelevant chunks — it's about noise. "How much of what was retrieved is useful?"
- Contextual Relevancy evaluates the semantic quality of the match between the query and the context. "How well does the retrieved content relate to what was asked?"
A retrieval system could have high precision (only relevant chunks) but moderate relevancy (the chunks are related to the topic but don't directly address the specific question).
Example
Input: "How do I integrate your API with a Node.js application?"
Retrieved chunks:
- "Our REST API supports integration with any programming language. Authentication is done via API keys."
- "For Node.js, install our SDK with
npm install our-sdk. Initialize with your API key and start making requests." - "Our API documentation is available at docs.example.com."
Score: 0.85
Result: Passed (threshold: 0.8)
The retrieved chunks are relevant to the query — they cover API integration and specifically mention Node.js, though the general API overview chunk is less directly relevant.
Tips for Improving Scores
- Structure your knowledge base content around the types of questions users are likely to ask.
- Use clear, descriptive headings and titles in your documents to improve semantic matching.
- If queries about specific topics consistently retrieve generic content, add more detailed, topic-specific documents.
- Consider reorganizing your knowledge base to group related content together rather than spreading it across many documents.
- Review low-scoring cases to identify patterns in what types of queries produce poor retrieval matches.