On March 20, 2025, Amazon announced the general availability of RAG (Retrieval-Augmented Generation) Evaluation on Amazon Bedrock. Users can now assess their RAG applications—either those built with Bedrock Knowledge Bases or custom RAG pipelines—using a variety of evaluation metrics. The evaluation process uses an LLM-as-a-judge approach, offering metrics for context relevance, generation correctness, completeness, hallucination detection, and responsible AI indicators such as harmfulness and stereotyping. Users can now bring their own input-output pairs and retrieved contexts for custom evaluations, and new citation-related metrics are also available. This update offers more flexibility and supports iterative development across RAG configurations.
2025-03-22
Comments
Share your comments