💡 Note: You can select up to three agents to be part of the same evaluation.
.csv
file containing the test cases for your evaluation. You can also export an example dataset to see the expected format.
💡 Note: The input/test queries must be in the first column of your .csv
file. Headers are not included or counted by default.
💡 Note: If an evaluation is running, you can Stop it at any time. Stopping and then Running it again will create a new evaluation.
💡 Accuracy Score Calculation: Airia calculates a unique accuracy score using a weighted geometric mean of four factors: Hallucination, Correctness, Relevance, and Semantic Similarity. Each dimension captures a key aspect of agent performance—factual grounding, alignment with expected output, task focus, and semantic closeness. The formula penalizes weak areas to ensure the final score reflects true execution quality.In the Evaluation summary tab, you can find a detailed explanation of each agent’s run and insights into why the accuracy score was high or low.