We've reworked how all Evaluations are performed across 

 for improved accuracy and reliability. Now, it's easier than ever to get from consistent quality Evaluations to direct and actionable feedback for your team!

Instead of Success Criteria driving Evaluations, they've been replaced with 

. These Frameworks can be modified in your Suite settings, for full transparency on the steps and process in determining whether an Evaluation passes or fails. You can directly provide feedback to regenerate a new version of the Framework right in the editor.

If any Evaluations pass/fail when they shouldn't, you now have the option to quickly provide feedback on the Framework (which will generate a new version with that feedback), and re-evaluate all other Evaluations in the Test/Suite Run with the new Framework you just generated.

Based on feedback we've received from our customers, it's hard to quickly see what the common failure reasons are across large Suite Runs of hundreds of Evaluations. You can now view an intelligent grouping of results in each Suite Run Report that is generated in the Overview of Results section!

Coming soon, we are going to add an integration with Linear for automatic Issue creation based on these groupings!

bottest-ai changelog

Improved Evaluation Accuracy