Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

People routinely make up their own vague and ill defined meanings of understanding and reasoning to disqualify LLMs. This is necessary because LLMs obviously reason and understand by any evaluation that can be carried out.

Seriously just watch. He's not actually going to be able to coherently define his "reasoning" in a way that can be tested.



> Seriously just watch. He's not actually going to be able to coherently define his "reasoning" in a way that can be tested.

Google gives the following definition of the verb "reason":

> think, understand, and form judgments by a process of logic.

LLMs do not think, they do not understand, and they do not form judgments. They do not come to their own conclusions. They do not have the physical capability. They are statistical models, nothing more.

> LLMs obviously reason and understand by any evaluation that can be carried out.

Uh-huh. Sure, Jan.


>LLMs do not think, they do not understand, and they do not form judgments. They do not come to their own conclusions. They do not have the physical capability. They are statistical models, nothing more.

"LLMs don't reason because they don't understand" is not the bastion of genius you think it is. It's a circular argument that relies on whatever bespoke interpretations you have cooked up.

They don't form judgement or conclusions? Sure looks like they do. So what's the difference ?

What is GPT-4 doing then when it correctly looks like it is reasoning and what's the difference between that and "real" understanding or reasoning.

Such a huge difference I should be able to test for it. Don't understand how you can tell me what I'm seeing isn't real reasoning but fail to provide a way to empirically determine the difference.

>Uh-huh. Sure, Jan.

Yeah. https://arxiv.org/abs/2212.09196 Many other evaluations to carry out.


New bar for people claiming LLMs can't reason: invent a specific, testable problem, representable in text, that many humans can solve and LLMs can't, and tell us what it is.


You know, François Chollet literally did this, and people don't listen. People should listen to Chollet more.

https://twitter.com/fchollet/status/1638643323748618240

https://arxiv.org/abs/1911.01547


That's not stuff an LLM can't do. It's just presented in a way that makes it difficult to do so.

First the vision problems will require the equivalent of an artificial visual cortex, something we are seriously lacking in artificial intelligence at the moment. Image to text won't cut it here.

For the text, LLMs don't really have any problem with analogical reasoning https://arxiv.org/abs/2212.09196


Is ARC a benchmark that GPT-4 can be tested against today? I would be curious to see its results.


Yes. However,

The vision problems will require something much more than an image to text objective task. It will require the equivalent of an artificial visual cortex. We don't have that yet.

For abstract analogical reasoning, LLMs don't have a problem with that. https://arxiv.org/abs/2212.09196


Yes it is, and it was tested. GPT-4 can't solve any of the tests.


… and then perform a careful search of books and the whole internet to be sure what you think is novel hasn’t been thoroughly debated somewhere on stackexchange.


If it's a stochastic parrot, then merely randomizing proper nouns and filler text should be enough to prevent its abstraction ability.

If you're saying that we can't use a problem if any analog of that problem has ever been described, you seem to be arguing more strongly that it is a general intelligence than I am.


I’m saying that people in my circle have been asking what they think are novel questions and getting interesting answers, only to find out that very similar content exists on websites we know are in the training set.

That’s not intelligence that’s computers having better memory than humans. Useful, certainly, but hardly skynet.


I don't think you're being clear about whether the questions were novel. If you discover your question was uncreative, surely e.g. some details, wording, facts, names, or numbers inside the question can be changed to defeat a model that is answering it from memory?

If you're saying that it is not possible to change the details enough to avoid the model being able to answer that type of question, I think you are admitting that the model has learned a generalized ability to answer questions of that class, and is not actually using its memory to answer at all.

I don't care about whether it learned that generalized ability from seeing examples of the question and answer, which it then deduced an algorithm for and generalized -- that's how most people learn most things.


The asker thought they were. They were not. The internet is big and human memories are not.

As an aside, I’m really starting to hate these threads on here, people are constantly reading words that aren’t there in search of gotcha-it’s-skynet. It’s not. It’s just pattern matching and randomness with a giant amount of information encoded.


> As an aside, I’m really starting to hate these threads on here.

I'm not sure what to say, other than that if you'd like to have less frustrating conversations, you could do better than showing up with hearsay where someone asked a question they thought was unique, but it wasn't, and it can't be modified to be unique and then asked again, and you aren't willing to tell us what it was, and possibly don't know yourself.

It is not possible to have a serious conversation about your claim, and that's not because it is being intentionally misunderstood.

> skynet

You're the only person mentioning skynet. The conversation is about a ridiculous claim made up-thread that GPT-4 cannot reason or understand anything, which is disprovable within a few minutes of using it thoughtfully.


This is the exact type of unserious wish-reading I’m talking about. Do better.


We're survival machines, nothing more...

Complex behavior arises from simple systems all the time. You can't prove that these systems don't reason, no matter how loudly thou doth protest.


from the beginning of time people have been overestimating the complexity of things like the human brain and attributing it to magical things (like a creator) far beyond our comprehension but what seems to be happening now is that some people are underestimating it.


This is the problem with non-operational definitions, because now we need to know how you define "think" and "understand" and "form judgments", to move on.

Instead, could you operationally define "reason" in a way that a human is, say, 90 % likely to pass the test and GPT is 10 % likely to do?


Yes, François Chollet released ARC(Abstraction and Reasoning Corpus) benchmark for this in 2019, and the benchmark can be scored automatically. Humans solve 100% of tests and GPTs solve 0% of tests and GPTs made exactly zero progress from 2019 to 2022.

https://twitter.com/fchollet/status/1631699463524986880

https://github.com/fchollet/ARC


This seems more like an issue with the tokenizers, no? This doesn't seem more difficult than other problems it solves.


You're right that LLMs can solve abstract analogical reasoning problems https://arxiv.org/abs/2212.09196

Another issue is the vision side. The vast majority of multimodal models are working on essentially an image to text objective task. That won't cut it here. We need the equivalent of an artificial visual cortex. We don't have that yet


Wow, this is a much more interesting answer than I expected. Thank you!


It is. Although I'd be careful with the conclusions.

The problems are presented in a way that make it difficult to solve. The vision problems will require the equivalent of an artificial visual cortex, something we are seriously lacking in artificial intelligence at the moment. Image to text won't cut it here.

For the text there could be tokenizer issues. LLMs don't really have any problem with abstract analogical reasoning https://arxiv.org/abs/2212.09196




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: