Interesting that you consider the most cutting edge technology in the category t...

marcus_holmes · 2026-02-13T03:57:18 1770955038

I think they've been gaming benchmarks.

I use Claude every day. I cannot get Gemini to do anything useful, at all. Every time I've tried to use it, it has just failed to do what was required.

asdff · 2026-02-13T07:22:23 1770967343

Three subthreads up you have someone saying gemini did what claude couldn't for them on some 14 year old legacy code issue. Seems you can't really use peoples prior success with their problem as an estimate of what your success will be like with your problem and a tool.

arw0n · 2026-02-13T18:12:27 1771006347

People and benchmarks are using pretty specific, narrow tests to judge the quality of LLMs. People have biases, benchmarks get gamed. In my own experience, Gemini seems to be lazy and scatter-brained compared to Claude, but shows higher general-purpose reasoning abilities. Anthropic is also obviously massively focusing on making their models good at coding.

So it is reasonable that Claude might show significantly better coding ability for most tasks, but the better general reasoning ability proves useful in coding tasks that are complicated and obscure.