Skip to main content

Answer Relevancy

Generic Metric

Introduction

The Answer Relevancy metric measures how relevant your AI Agent's response is to the user's question. It ensures that the Agent directly addresses what was asked rather than providing off-topic or tangential information.


When to Use This Metric

  • You want to verify that your Agent answers questions directly and concisely.
  • You're testing whether prompt changes cause the Agent to drift off-topic.
  • You need a general-purpose quality check for any type of conversational Agent.
  • You're comparing response quality across different LLM models.
  • You want a baseline metric to include in every eval case.

Configuration

ParameterTypeDefaultRequiredDescription
thresholdfloat0.8NoScore threshold for passing (0.0–1.0).
strict_modebooleanfalseNoRounds score to 1.0 or 0.0 based on threshold.

How It Works

  1. The AI Agent receives the user's input message and generates a response.
  2. The testing LLM analyzes the response and generates questions that the response would be a good answer to.
  3. These generated questions are compared against the original input message for semantic similarity.
  4. The more closely the generated questions match the original input, the higher the relevancy score.

Scoring

  • Range: 0.0 to 1.0 (higher is better).
  • High score (close to 1.0): The response directly and completely addresses the user's question.
  • Low score (close to 0.0): The response is off-topic or doesn't address the user's question.
  • Pass condition: The score must be greater than or equal to the configured threshold.

Example

Input: "What are your business hours?"

AI Response: "Our business hours are Monday through Friday, 9 AM to 5 PM EST. We are closed on weekends and major holidays."

Score: 0.95

Result: Passed (threshold: 0.8)

The response scores highly because it directly and completely answers the question about business hours.


Tips for Improving Scores

  • Include clear instructions in your system prompt about answering questions directly.
  • Avoid overly verbose responses that bury the answer in unrelated information.
  • If your Agent tends to add disclaimers or tangential context, instruct it to prioritize the direct answer first.
  • Test with different phrasings of the same question to ensure consistent relevancy.