I think we already hit somekind of performance wall begin of this year. It feels that models are now balancing between rule following and agentic case and general stuff. eg Claude 4 sonet just feels better in Cursor and follows rules very well, and same time it gets equal or worse scores in benchmark against 3.7 Sonet.