> Maybe: there is no measurable difference between 'real' reasoning, and 'fake' reasoning.
The point is, it's easier to teach an LLM to fake it than to make it - for example, they get good at answering questions that overlap with their training data set long before they start generalizing.
So on some epistemological level, your point is worth pondering; but more simply, it actually matters if an LLM has learned to game a benchmark vs approximate human cognition. If it's the former, it might fail in weird ways when we least expect it.
It's like students learning for the test, not really understanding. Or like regular people who don't always understand, just follow the memorized steps. How many times do we really really understand and how many do we just imitate?
I have a suspicion that humans often use abstractions or methods they don't understand. We frequently rely on heuristics, mental shortcuts, and received wisdom without grasping the underlying principles. To understand has many meanings: to predict, control, use, explain, discover, model and generalize. Some also add "to feel".
In one extreme we could say only a PhD in their area of expertise really understands, the rest of us just fumble concepts. I am sure rigorous causal reasoning is only possible by extended education, it is not the natural mode of operation of the brain.
> I am sure rigorous causal reasoning is only possible by extended education, it is not the natural mode of operation of the brain.
I'd say the other way around, education teaches you to not reason and instead just follow the patterns you learned in the book. Most people do reason a ton before they go to school, but then school beats that out of them.
The point is, it's easier to teach an LLM to fake it than to make it - for example, they get good at answering questions that overlap with their training data set long before they start generalizing.
So on some epistemological level, your point is worth pondering; but more simply, it actually matters if an LLM has learned to game a benchmark vs approximate human cognition. If it's the former, it might fail in weird ways when we least expect it.