Text Similarity Evaluators
Text similarity evaluators compare the text output of an LLM application to a reference answer.
agenta offers three text similarity evaluators:
-
Similarity Match: When you need to check for the presence of key concepts, regardless of exact phrasing.
-
Semantic Similarity: For tasks where understanding and meaning are crucial, like summarization or question-answering.
-
Levenshtein Distance: When you need to measure exact textual differences, useful for tasks requiring precision in wording.
Similarity Match
The Similarity Match evaluator uses the Jaccard similarity
method to compare the generated answer with the expected answer. It takes a similarity_threshold
in its configuration and returns a boolean
if the Jaccard distance
exceeds this threshold.
Configuration
Parameter | Type | Description |
---|---|---|
similarity_threshold | Float | A value between 0 and 1 that determines the minimum similarity score for a match. |
How It Works
- The evaluator splits both the generated output and the correct answer into sets of words.
- It calculates the Jaccard similarity:
|intersection| / |union|
of these sets. - If the similarity score exceeds the threshold, it returns
true
; otherwise, it returnsfalse
.
Use Case
Ideal for the use cases where not only the meaning but also the wording of the answer is very important.
Semantic Similarity Match
This evaluator measures the semantic similarity between the generated output and the correct answer using embeddings.
Note that since this evaluator uses OpenAI embeddings, each evaluation incurs costs. However, these costs are not large—at the time of writing, the cost of embedding is $0.02 per 1M tokens, where 1M tokens is roughly equivalent to 2000 pages of text.
Configuration
No additional configuration required. However, please ensure your OpenAI API key is set in the agenta settings.
How It Works
- The evaluator uses OpenAI's
text-embedding-3-small
model to generate embeddings for both the output and the correct answer. - It calculates the cosine similarity between these embeddings.
- Returns a float value between -1 and 1, where 1 indicates perfect similarity.
Use Case
Excellent for evaluating responses where the meaning is more important than exact wording, such as in summarization or paraphrasing tasks.
Levenshtein Distance
This evaluator calculates the minimum number of single-character edits required to change the generated output into the correct answer.
Configuration
Parameter | Type | Description |
---|---|---|
threshold | Integer | (Optional) Maximum allowed edit distance. |
How It Works
- Calculates the Levenshtein distance between the output and the correct answer.
- If a threshold is set:
- Returns
true
if the distance is less than or equal to the threshold. - Returns
false
otherwise.
- Returns
- If no threshold is set, returns the actual distance as a number.
Example Results
0
: Perfect match6
: 6 edits needed44
: Significant difference (44 edits needed)
Use Case
Useful for tasks where slight variations are acceptable but large deviations are not, such as spell-checking or name matching.