(Disclaimer: founder of OpenPipe). Thanks for the shout-out. Note that we're actively working on improved evaluations that will let you add more specific criteria as well as more evaluation types, like comparing field values to that of a golden dataset. This is definitely something that customers are asking for!