I've just created a new benchmark to see how top LLMs do on NYT Connections (https://www.nytimes.com/games/connections). 267 puzzles, 3 prompts for each, uppercase and lowercase.
GPT-4 Turbo: 31.0
Claude 3 Opus: 27.3
Mistral Large: 17.7
Mistral Medium: 15.3
Gemini Pro: 14.2
Qwen 1.5 72B Chat: 10.7
Claude 3 Sonnet: 7.6
GPT-3.5 Turbo: 4.2
Mixtral 8x7B Instruct: 4.2
Llama 2 70B Chat: 3.5
Qwen 1.5 14B: 3.1
Nous Hermes 2 Yi 34B: 1.5
Notes: 0-shot. Maximum possible is 100. Partial credit is given if the puzzle is not fully solved. There is only one attempt allowed per puzzle. In contrast, humans players get 4 attempts and a hint when they are one step away from solving a group. Gemini Advanced is not yet available through the API.
What I found interesting is how this benchmark reveals a large capabilities gap between the top, large models and the rest, in contrast to existing over-optimized benchmarks.
Also these puzzles can be _really_ hard. As a French person who's lived 10+ years in English-speaking countries, I am often completely baffled. I am not sure humans would do a lot better with 0-shot.
It's probably somewhat g loaded. I don't know how much, but someone could look at the curves (if they have access?) for the similar sub-section of an IQ test.
"no major AI technology breakthroughs in decades.everything we are seeing is larger compute scaling." This is false. Everything from the transformer to advancements in state space models have been foundational breakthroughs
People who say things like "major breakthroughs" often imagine a cliff/steep rise. The reality is that most "breakthroughs" are small, incremental, almost invisible steps in all aspects of the field, and potentially even in fields that are only tangentially related.
Then, one day, they hear about it in the news, because there's now some hype, or some event that makes it newsworthy. This makes it feel like the breakthrough was instantaneous or steep, but in fact has been in the making for decades or even more.
I have no doubt AGI is coming, but it will be gradual and slow. It will be the accumulation of more advances in everything, including hardware, as well as software. It might even include economic changes.
From nothing to ChatGPT opened to the public and billions students' lifes rely on that in the next month, isn't this the biggest step in the tech history?
It's a major societal failure that education has been reduced to turning in coherent enough strings of characters, that's what I've learned in the past year.
You mean the use of a writing system to share knowledge amongst ourselves?
I find that absolutely wonderful and it worked decently well for me (and possibly you.) Now we have a never seen before technology and society will adapt, that's it. No failure.
> You mean the use of a writing system to share knowledge amongst ourselves?
No. And what's with the "it is so bad yet you used it"? I am very much allowed and required to denounce a system even if I cannot escape it or if I could have somehow profited from it or chosen to use it.
I very much reached the point where I am despite the educational systems I was exposed to. And a system geared towards memorization and regurgitation of data in textual format where pupils can successfully use a chatbot to avoid doing work is certainly failing its goals of educating the youth.
I would point you to the complaints of American teachers about the reading and mathematics levels of students, if only because that is widely accessible. I did not grow up in the US and the school system in my home country is leagues behind the US.
excluding... antibiotics, electricity, refrigeration, the combustion engine, the digital computer...
like even restricted to the domain of strictly computation, I'd say it barely scratches the surface... like even if we ignore computer engineering ("the transistor," "silicon microprocessors", etc.)... foundational tech like "compilers" are more significant.
even restricted to modern applications, GPS is more useful and life-changing.
so, no. It's not the biggest step in tech history.
Curious about how many kids in India and China are relying on it.
I'll accept the prospect of hundreds of millions, generously.
And "relying on it" is a strong phrase. Using it as a curiosity, sure. The ones relying on it seem to keep ending up in the news because of how, well, unreliable it is.
There were hundreds of competitive, even SOTA LLMs before ChatGPT existed. You're basically just proving the parent comment right in how small of a leap ChatGPT is from t5-flan or BERT.
I beg to differ. Transformers are purely an optimization. It’s not exactly right to call everything “compute scaling” but we are still, at the end of the day, fitting polynomials.
And frankly, that’s probably not what our brains are doing.
It is a bit weird though… I mean, you could just as well say that the last breakthrough in all of computing was the transistor (Shockley 1946) – all we’ve been doing ever since is just “scaling” and combining them in new ways.
AI has been a 'moving aim' as opposed to moving target. It's a label slapped on whatever slice of software engineering seeming most magic-like at any given decade.
Early computers in late 40s were called 'electronic brains' by the media...
It’s difficult to comprehend impact of something that’s widely adopted. Like, batch normalization alone was probably mind blowing when it came out. Yet it seems so simple and self-explanatory now.
> And frankly, that’s probably not what our brains are doing.
I think that it is! It's much more likely to me that our brains are doing something big and simple than small and complicated. That's the way that nature tends to work. Fitting low-order million-dimensional polynomials would meet that description.
Our brain is however the very definition of small and complicated. It's the most complex known organic object known to humans and for all that any given normal brain handles in a given day, it uses just 0.3 kilowatt hours (kWh) to do it. No computer we have comes close to handling what a brain handles with that power consumption.
ChatGPT by contrast, consumes roughly a gigawatt hour per day serving its users. Yes, to do this it's handling a colossal amount of queries, but that's all it's doing. Your brain handles everything in your body and consciousness, in ways we don't even fully understand, while also letting you think and communicate and reason as a conscious being with self direction.
Moreover, there is evidence that at least part of our brain's functions may be exactly as the other reply here mentions, weird, subatomic and deeply complex in ways that are difficult to get a clear grip on.
From the double slit experiment, to particle-wave duality, to the particle zoo of the 70s, to quantum chromodynamics, to asymptotic freedom, to more exotic theories like string theory, etc. tells us the complete opposite. Every major discovery in physics in the past 150 years seems to disagree. Things are extremely weird and complicated when we get extremely tiny. Why would our brains be different?
If we lived at the quantum scale, then classical physics would be the weird one. Quantum chromodynamics is only confusing for two reasons: it differs from our everyday experience so we don't have an intuition for it, and because it has a large number of mutually-interacting (but basic) components.
Richard Feynman put it very well:
"The world is strange, the whole universe is very strange, but see when you look at the details then you find out that the rules are very simple, of the game, the mechanical rules by which you can figure out exactly what's going to happen when the situation is simple. It's again this chess game; if you're in just the corner with only a few pieces involved, you can work out exactly what's going to happen. And you can always do that when there's only a few pieces. And so you know you understand it. And yet, in the real game there's so many pieces you can't figure out what's going to happen.
"There's such a lot in the world, there's so much distance between the fundamental rules and the final phenomena that it's almost unbelievable that the final variety of phenomena can come from such a steady operation of such simple rules... But it is not complicated, it's just a lot of it."
One of the things I wonder about is whether “intelligence” can be linearly scaled or if it’s just a way of solving an optimization problem. In other words, humans have come pretty close to the peak of Mt. Smarts and therefore being 1000x as intelligent is more like the difference between 1 meter from the peak and a millimeter from the top. You’re both basically there.
In other words, maybe humans have basically solved the optimization problem for the environment we live in. At this point the only thing to compete on is speed and cost.
You don't see that many, or any really, von Neumanns walking around so there's probably still significant room to improve with all the benefits of having intelligence neatly packaged in a computer.
Yeah, imagine spinning up 100 von Neumanns to attack a problem. They can all instantly share their thoughts & new skills, coordinate, choose new exploration directions, and spend decades developing new tools -- all within moments after pressing 'Enter'.
Even if our AI systems have only a minute fraction of von Neumann's intellect, we still have no idea what tomorrow will be like. I'm terrified and excited.
Even if all the computers can do is ask the right questions and it takes a big research project to figure it out, that would be an improvement in productivity.
I actually think it will come from the other direction. That people will get better at asking questions, because there is an automated tool that will build systems to answer larger problems than a single person could quickly answer.
I don't think there is such a thing as general intelligence, there are only capabilities. What we call "general" intelligence is really just the set of capabilities that a human has, because we're self-centered.
If we had more intelligences around to compare with I think we'd find that some are "more intelligent" in that they have all of our capabilities, plus some. And that others are "less intelligent" in that we have all of the capabilities that they have, plus some. And then there would be the "differently intelligent" which have at least one capability that we don't and which lack at least one capability that we have.
Under this lens, I don't know if there's much utility in fine grained comparisons of intelligence re meters and millimeters. The space is discrete: subsets, not metrics.
I don't know if you could ever prove something like this (or maybe we just lack that capability). It seems more like an axiom-selecting notion than something to be argued. Anyhow, it's what my gut says.
I think it’s an interesting thought. But for the sake of it, can you name/imagine some examples of problems/forms of problems we as human can not solve? Or can not solve efficiently?
If you can find none, is it not the proof that our intelligence is general?
The first thing to come to mind is the traveling salesman problem and the host of other unsolved math problems which we suspect may be unsolvable-by-us.
There's also problems of self reference. A Turing machine may be able to solve the halting problem for pushdown automata, but it can't solve the halting problem for Turing machines. Whether or not we're as capable as Turing machines, there's a halting problem for us and we can't solve it.
I'm restricted to mathy spaces here because how else would you construct a well defined question that you cannot answer? But I see no reason why there wouldn't be other perspectives that we're incapable of accessing, it's just that in these cases the ability to construct the question is just as out of reach as the ability to answer it.
You may have heard talk about known unknowns and unknown unknowns, but there are also known unknowables and unknown unknowables, and maybe even unknowable unknowables (I go into this in greater detail here: https://github.com/MatrixManAtYrService/Righting/blob/master...).
In any case, I don't think it's ever valid to take one's inability to find examples as proof of something unless you can also prove that the search was exhaustive.
Instead of AGI, we should call it AHI: artificial human intelligence, or SHI: super human intelligence. That would be much clearer and would sidestep the generality issue.
We already have technology which beats what we would be capable of ever doing and almost instantaneously.
If you want a concrete example, astronomical image processing would be one: impossible for humans without AI.
In that same logic, if we invent AGI and then it solves a problem for us then it counts for humanity? (and of course it does but here we're talking about something that humans wouldn't solve without AGI)
Yes we did, but there’s a difference between delegating a task (asking a computer to do it) and executing the task (running the calculation). Otherwise you might as well say humans can run 40mph because we can ride horses.
Also, no one person invented the calculator. The calculator is the culmination of hundreds or even thousands of years of invention. It’s not like the knowledge or creativity is in each of our brains and we could each build a calculator given the requisite materials. It took thousands of lifetimes of ingenuity. So there’s another answer to your question of things we aren’t efficient at solving: building a calculator.
Predictions aren't worth much without a bet, but I think the tech will plateau in the next decade, for several years or more, just like it has in the past
One main reason is that I think people underestimate how much work OUR brains are doing when we interact with LLMs. It seems like the initial "wow" has worn off for many people, but definitely not everybody.
For coding, people will get stuck in loops, trying to get LLMs to modify LLM-generated code
And I think the market will cool down, which seems inevitable considering Nvidia's stock price (I'm a shareholder), and the fact that they seem to be the only ones really making money
If you compare Google after 8 years (2004) to OpenAI after 8 years (2023), the business is uh very different
The problem with LLMs for code editing is that they generate new tokens, instead of performing in place context editing (IPCE). It probably shouldn't be that difficult to just add a final layer that is trained to perform IPCE. Then you would get a lot of the benefits of MemGPT without an external tool.
The LLM could go back and only rewrite a handful of lines.
I feel like the space has already plateaued. Lots of improvements up to GPT4 were genuine milestones, but that’s now a year ago and everything since was marginal.
I’m not invested in any sense in the space. I’m actually more frequently turning off Copilot in VSCode recently. I’d like to see further breakthroughs as much as anyone, but am not holding my breath. In fact, shorting NVIDIA seems like one of the better ideas currently.
But couldn’t you have said that 5 years ago and 10 years ago? I’d give it 3-4 years to actually call it.
But if you believe strongly, shorting AI-enhanced stocks is a great way to capitalize on your prediction. You could also use the short as a hedge for your expectations (either way you win something).
There's a real chance of rapid decline as the advance we've seen on GPT3 was largely due to openAI being able to efficiently train it on common crawl but now this data body is getting poisoned by automatically generated content.
I don't see that as a threat just yet. It seems simpler: stock value prices expected future growth. Nvidia has already grown to a highly dominant position. I don't see how it can grow much more to fill expectations of the staggering stock price. I'm expecting more of a regression to the mean soon, with Nvidia losing a bit of their lead.
What if intelligence is an product of consciousness, and consciousness is an product of something that can never have a physical definition and is always ethereal... i.e. a "soul".
If we can achieve AGI simply through more and more computation, no matter how novel it is, its ultimately ifs, loops and arithmetic... then surely the human experience is ultimately just a 'wet LLM' (or whatever we end up calling the machine learning technology behind AGI).
What’s the difference between real consciousness and simulated in silico consciousness? If the fake consciousness is good enough there shouldn’t be any.
Is a soul made out of matrix multiplications and dot products worse than a soul made of neurons?
I like to call it carbon chauvinism. There are people who believe that no matter how high the simulation fidelity, no matter the level you simulate the neurons, you can't simulate consciousness. I think they can be safely ignored until after we've tried.
This is the kind of question that every NonSTEM people would struggle at. Given function F and G. For all input x, resulting F(x) = G(x), this indicate that obviously F and G is the same thing whether inside or outside.
This is kind of where I've been thinking lately. If I can speak of human bodies like computer hardware, there is a physical limit to our hardware and how much data it can process, and yet intelligence seems to be something beyond the hardware. Something that the living 'you' brings to your hardware.
Given that Harry Potter, Jedi Knights and Souls are fantasy items from literature your point with regards to testable silicon based intelligence is what exaclty?
Like humans, they have physical bodies and the breath of life in them. For the sake of what this person was talking about it seems like it would apply to their intelligence as well.
Uhh again the "only human has soul" argument. What means by soul, please define this word, philosopher. Does bird know language grammars indicate that soul exist? Does fish can do arithmetic up to 5 show that their soul exist? Does elephant do funeral to their dead fellow show that there is a soul? Can this humancentric ideology just go away?
The future is not known to us. But given how inefficient machine learning seems to be, algorithmic efficiency improvements may keep the scaling going for a while? Maybe that's not a "major breakthrough" but it's improvement nonetheless.
It's also going to take a while to learn to use the new toys we already have.
> given how inefficient machine learning seems to be
We emulate neurons mathematically but it is possible to build efficient analog circuits that emulate them physically.
I doubt any biological system can ever learn all the information gpt4 has learned. The gpt4 learning may not have been power efficient but neither were the first airplanes compared to birds yet today biological flight is rather limited compared to flight that uses technology.
Ever listen to Geoffrey Hinton speak about back propagation vs what biological systems use? Do you think he is wrong about this?
He isn't wrong, but with current GPU hardware back propagation is actually the superior algorithm compared to his latest forward forward algorithm. Swapping the training algorithm doesn't get around the fact that you need to perform lots and lots of forward passes and those need an insane amount of memory bandwidth. Back propagation isn't a significant bottleneck in computers and also not in terms of memory capacity. So the only benefit of the biologically plausible forward forward algorithm is that you could run it on a digital or analog NPU, but with Ryzen AI coming in Strix Point, every high end laptop is going to have 50 TOPS of AI compute and 200GB/s of memory bandwidth. Nothing except large datacenter GPUs or a hypothetical 5090 with 32 GB would be competitive against that anytime soon. Anything smaller and you will have to buy several 3090 on eBay for like $900 a pop. Analog circuits are far off for now.
> I doubt any biological system can ever learn all the information gpt4 has learned
The fact that human brains can become competent at language with training datasets with a size of a tiny fraction of common crawl points to how much more efficient they are
Minimal as in the amount a person needs to learn the language or minimal as in tiny if compared to English in common crawl but still orders of magnitude more than a person goes through for infancy language acquisition?
It’s anecdotal, but here [1] is an example of AI learning to translate a new language faster than a human could learn to do it, from a similar amount of data, using Claude 3 Opus.
Translating a language isn’t quite the same as understanding it, but this is still impressively data-efficient.
"The author has later responded and apologized a flawed / biased methodology which led them to believe that Opus has no prior knowledge of Circassian (i.e. wasn't trained on it)
Apparently it was, and is able to speak Circassian just not perfectly"
The result is bad, that is just an image recognition model we know those can be trained with relatively little words and just a bunch of images, the model didn't learn grammar or sentences or relationships like a baby would. The only models we have that can do grammar and sentences and relationships are LLMs with massive amounts of data.
I'd love to see more algorithmic efficiency in ML. But hasn't ML been going in the other direction for at least a decade now? It seems to me that it's been going aggressively towards brute force algorithms. Specifically, figuring out how to do as much matrix multiplication as possible.
In life I tend to encounter two common patterns of intelligent people: those who had a good education and those who did not. I worry that when AGI comes it is going to be able to do all the things the smooth fast taking wily folks can do, and none of the things the educated folks can do, and we’ll accelerate not a slide into the singularity but a slide into inane banality.
How do you provoke a model into being wacky, challenging, and innovative?
I told it to be unfriendly and use black humour. It certainly is wacky, especially with that TTS voice. Running this locally made me realize that OpenAI's ChatGPT is way too uptight and the personality is way too bland. The LLM can already whip out "smooth talk" faster than me.
> we’ll accelerate not a slide into the singularity but a slide into inane banality
Half joking response: Have you looked at all the SEO garbage than any Google search produces these days? We are already in the great age of "inane banality".
I can’t put my finger on it, but I think the intellectually challenging part is at the heart of the matter. A wily person can talk their way through a debate by saying things that sound compelling. An educated person has more ability to reason: they can challenge you when they know they are right, and explain why you are wrong.
It’s the difference between bluffing and sincerity, or dishonesty and truthfulness. Current LLMs are confident liars.
At the very least, it knows solved problems and repeat the solution. That's better than pretty much all humans, since humans can't know all known solutions.
This article does not debate the question in its title, makes ridiculous claims like “there hasn’t been any major breakthrough in AI in decades”, and does not offer any real argument.
I think "LLMs are using well-studied modeling techniques with overwhelming resource investment" is the most fundamental critique and why I've been skeptical of the future of this wave. That's not to say we won't (and haven't already) gotten useful tools! There's obviously a lot to do with human language interfaces and complex analysis. I'm just skeptical a whole new level is just around the corner.
I honestly don't see what the problem is. One can say the internet is just to "connect machines with wires and have a set of protocols allowing them to communicate". It's true, but the magic happens when simple ideas get scaled.
When you have to throw billions of dollars worth of compute at a problem to brute force it, you're not exactly 'scaling it' as much as scaling your costs for diminishing returns.
There's also the question of input data, though. Current large models have been trained on all the available human-created input. Trying to add more will lead to poisoning by AI-generated data and model collapse.
Model collapse is basically a myth and is a joke in the ML community. The assumptions for the model collapse paper do not hold in the real world even when training on uncrurated generated data. In fact, LLMs of equal size trained on newer web scrapes which include generated data have enhanced capabilities.
But in practice training data is curated and synthetic generated (curated) training data is even better than human data. State of the art LLMs like Phi;2 or the recent GPT-4 killer Claude 3 are trained entirely or mostly on generated data.
It's probable you can train the same system on the same data multiple times and still get an improvement. You could also train on universal sequence prediction data in between as well.
It’s fine if I doesn’t ; current LLMs are already very helpful; we need them faster, smaller and eating less resources. If not AGI, let’s run 50 personal assistants on my phone.
I think this is where people will be disappointed when “AI” is brought to mass consumer.
The level of results does not scale well down to mass consumer hardware.
And yes I know people can buy an NVIDIA GPU and run these models, but the phone like you said is the most common computer and where this will be hardest to scale too.
It’s why I’m bearish on AI, and I think the pop will be due to being unable to scale down sufficiently
> The level of results does not scale well down to mass consumer hardware.
For now. I'm pretty bearish on all things "AI" but of all things one can say about the future, today's hardware is yesterday's news. And in this case, I'd say the same goes for algorithms.
> The level of results does not scale well down to mass consumer hardware.
I think that it does, though. A few years ago, running any remotely complex (and decent) generative AI on anything that wasn't a large compute server was out of the question. Nowadays, you can run a very respectable image or text generation algorithm on a middle-of-the-road gaming PC. Local models may not always be as good as the enormous things that companies run, but they're putting up a fight.
This isn't founded on research, but given how much people were able to scale all of it down, I feel like there's going to be more of this "fat" to trim - data that takes up a lot of space but isn't very important to the final result. Add onto that the constant improvement of hardware, and the lines are going to intersect eventually.
Strix Point isn't even out yet. Sure that is a laptop form factor and you won't be able to run those 130b models because of RAM constraints, but things like mixtral 8x7b will easily run at 20 tokens per second locally on your laptop within this year.
And what will 20t/s give the user as a user experience?
I don’t mean to sound combative, but my point is that the disconnect between what customers will get and what they see is very high. Will your metric meaningfully change that aspect?
I think it’s not really relevant though to the current timeframe.
The expectation of the current bubble is that something will be delivered soon (in the next year or two)
I don’t see mass consumer hardware scaling up quickly enough in that time to have a product that will match the hype that everyone is showing with cloud based tools.
The article claims as part of its argument that AI has not had algorithmic advances since the 80s. This is an exceedingly false premise and a common misconception among the ignorant. It would actually be fairer to say that every aspect of neural network training has had algorithmic advances than that no advances have been made.
Here is a quote from research related to this subject:
> Compared to 2012, it now takes 44 times less compute to train a neural network to the level of AlexNet (by contrast, Moore’s Law would yield an 11x cost improvement over this period). Our results suggest that for AI tasks with high levels of recent investment, algorithmic progress has yielded more gains than classical hardware efficiency.
When you apply the principle of charity you can make their claim increasingly vacuous and eventually true. We're still doing optimization - we're still in the same general structure. The thing is, it becomes absurd when you do that. Its not appropriate to take such a premise seriously. It would be like taking seriously the argument that we haven't had any advancement in software engineering since bubble sort since we're still in the regime of trying to sort numbers when we sort numbers.
Its like, okay, sure, we're still sorting numbers, but it doesn't make the wider point it wants to make and its false even under the regime it wants to make the point under.
This isn't even the only issue that makes this premise wrong. For one, AI research in the 80s wasn't centered around neural networks. Hell even if you move forward to the 90s PAIP puts more emphasis on rule systems with programs like Eliza and Student than it does learning from data. So it isn't as if we're in a stagnation without advance; we moved off other techniques to the ones that worked. For another, it tries to narrow down AI research progress myopically to just particular instances of deep learning, but in reality there are a huge number of relevant advances which just don't happen to be in publicly available chat bots but which are already in the literature and force a broadening. These actually matter to LLMs too, because you can take the output of a game solver as conditioning data for an LLM. This was done in the Cicero paper. And the resulting AI has outperformed humans on conversational games as a consequence. So all those advancements are thereby advances relevant to the discussion, yet myopically removed from the context, despite being counterexamples. And in there we find even greater than 44x level algorithmic improvements. In some cases we find algorithmic improvements so great that they might as well be infinite as previous techniques could never work no matter how long they ran and now approximations can be computed practically.
It strikes me as amazing that we went from the general recognition that AGI wasn't anywhere soon to suddenly having this widespread idea that it was right around the corner.
Sort of reminds me of the late 90s super-proto-VR stuff where people thought any day now we'd be jacking into full immersion (tactile, smell and all) virtual reality.
Don't get me wrong, LLM's are useful tools. But ChatGPT aint Neuromancer. Or even Wintermute. It's Clippy after a few years of community college.
I find a simple thought experiment answers this question. Imagine we trained an LLM using modern methods, and gave it infinite compute, on the entirety of human knowledge from 200,000 years ago. Would that AI then be able to create calculus, even if by another name obviously? I offer that as an example, because there's 0 need for knowledge of the physical world to derive calculus. All of mathematics is entirely an invention of the human mind.
I think the answer is quite obviously no. LLMs can recite their training, and recombine it in ways that correlates strongly to how a human might do so. But creating entirely new knowledge, that goes above and beyond recombinations of what is already known, remains entirely outside the domain of LLMs. An LLM trained on slow classical music is not going to create rap. And an LLM trained on rap is not going to create classical music. And those are trivial examples since it's not entirely new, but just taking a concept and using it a slightly different way than 'normal.' Math, by contrast, is literally creating something from nothing.
And this ability to create something from nothing is probably the most key indicator of intelligence. And we've yet to even step foot on the path towards creating software with this ability.
Your logical loophole is that you forgot to include "process of creating thing" into the subset of human activity and this can be trained. Back in the days AlphaGo make some magical novel move in the game that shocked every Go experts. People study how AI play games to improve themselves nowadays.
LLM nowadays solve insane hard science problem, invent new algorithms that you don't see.
A game move, no matter how remarkable, is just a recombination within a strictly defined domain. And LLMs can do some awesome things in these very tight and well defined domains, especially when there is a way for them to determine victory or score. But stepping outside of well defined domains and creating knowledge that does not directly derive from what is already known is where intelligence shines. We somehow created mathematics out of absolutely nothing. We looked at the world around us and simply 'though it up' as a child might say.
No we didn't. We are very good at parallel construction, to come up with the clean implementation of what the maths can be, but most of the examples given here were extensions of thinking about physical problems.
Remixing Jazz tokens might never generate classical music, but an evolutionary algorithm using musical notes and basing its cost function on what humans seem to enjoy might still discover something similar or better. A transformer could then generate infinite versions of that discovery.
I think this is provably false by looking at the various semi-isolated primitive tribes. They think about the world endlessly, yet many have yet to invent even basic mathematics. Many lack even the concept of numbers. For instance one tribe [1] only has concepts a few, more, and a lot. And in reality that's all you would ever really need for real world usage, pre mathematics. Mathematics largely creates the problems that it's used to solve.
More interesting is that these tribes also tend to lack the ability to recall even basic quantities of things if it's more than 2. You can give them 5 balls, but they will have extremely poor recollection of the number after just a few moments. There's these countless things humans need to simply invent, from nothing, over and over to advance to the next technological era. It only seems like a natural application of logic in hindsight, because it actually works!
I am not sure this is in support of the thesis how human intelligence is superior: it actually took millions of humans to continue experimenting from what they've learned from humans who came before them.
I am not saying it's not superior (I do believe it is), just that this does not really support it.
Basically, a single human acting on its own during their single lifetime has almost zero chance of inventing mathematics (as evidenced by the tribed you mention, but also examples of children lost in woods and raised by animals).
The point of the example is that math does not just logically derive from observation of the natural world. The reason we, in modern times, might intuitively think otherwise is because it so beautifully describes much of the universe around us. Yet these descriptions, or even the possibility of their existence, are impossible to even begin to imagine until you have developed math itself. And the problems we might think of it as solving aren't really problems until it's invented. As the tribes show you don't even need the ability to perform basic counting to live and thrive as a people.
And yes, it's going to be a slow process. Human intelligence is not about speed. The capability to do simple arithmetic rapidly correlates with intelligence in humans, yet a calculator can perform said calculations many millions of times faster than the fastest human. Yet that of course doesn't mean said calculator is thus millions of times more intelligent. The question is how long would it take an LLM to discover mathematics starting from the same empty baseline? And, using current methods, the answer is quite simple: never.
Mathematics is not a parlor trick. It's a purely cognitive and abstract system which can be infinitely recursed upon itself to learn ever more which, seemingly by good fortune, happens to map onto many things in the real world. Once one learns addition, the concept of multiplication comes naturally. As does subtraction, at which point you quickly end up on the eery domain of negative numbers, and so on.
To see how silly these things are imagine somebody claiming to have taught mathematics to one of these numberless tribes by training them to pick a larger group of items when it's blue, and pick a smaller group of items when it's yellow. Somebody claiming that is teaching them math would obviously be looked at like an idiot. It's a circus trick - and a rather apt reflection of what passes for research now a days.
Tribes who count one, one and that one, and string theory aren't the only two possibilities. There are the ancient Egyptians in between, who basically discovered fractions while dividing bread [1].
Whats to say if you gave an LLM a pre prompt conditioning for things like:
Be curious,
Explore the domain you exist in,
Learn about your environment like your life depends on it,
You are a social creature so when you find out information you like to share it,
Being a social creature, the most practical tool for sharing ideas is language
(these notions among many others are similar pre-conditions for humans behaviour, largely I imagine for survival, but in our case initially informed by emotion rather than language)
And then just feed it endless continuous video of the world from a persons perspective (ie, not just random jumps around scenes)
How is that any different to the way humans developed our own theories and understanding on the universe we found ourselves in? Obviously a video feed isn't quite the same as having a myriad of senses like we have for interfacing with the world. But theres no reason to say newtons laws need touch or smell to be figured out (maybe that is a bad example for not requiring touch, being about mass and force, but you get the point). Learning language and understanding of the world from scratch might depend on having physcial interaction with the world, sure, but thats again just different sets of input training data that a multimodal model can make connections between if there were pressure to do so.
Then you simply prompt the LLM to define the laws it has observed in a way that can use language to transmit the idea.
In the human equivalent, a "prompt" doesn't need to be (but can be) a direct question being asked. It could just be our urge to share ideas, an emotion perhaps that moves us to action.
I don't want to go too far on this thought as it is pure speculation without serious scientific backing, but for the sake of the thought experiment with all this in mind you could even say that brains are much like an LLM that is in a constant state of training & updating, but also being "prompted" from our environment and biological feedback loops being fed in. I don't feel like this idea is even that controversial or remotely original.
The difference is that models of today are already trained on the corpus of human knowledge so they already know all of this instead of having to figure it out. But I don't really see an argument for why an LLM wouldn't be able to crystallise observed patterns into mathematical formulae if it had the appropriate influences that would encourage it to do so. Its just that we don't care to train them inefficiently. I imagine this will be tried in the near future though.
FYI there is "symbolic fitting" that crunch through all combination of equations to find the math equation for simple phycis laws. That seem to me is hackable, like inserting heuristics to speed up the process. That heuristics can be generated from LLM.
Another way of saying this point is that intelligence requires interaction with the world, or embodiment. Hip hop can’t be derived from European classical music without the context of 70s-80s NYC.
We could debate that, but I think it's easier to just refer to the primary example of math. I offered that as an issue that requires absolutely no external interaction required there, whatsoever. It's entirely a mental creation. Like a child might say we somehow just 'thought it up.'
I might be talking past you here, but: The idea that mathematics is entirely mental is a contentious statement and not one a lot of mathematicians would necessarily agree with.
I think your intuition is right but I think it’s more because of the embodiment of humans in the world. The machine wouldn’t invent calculus because it wouldn’t be in a physical world that needed to.
Again nonsensical body-experience argument. Do you need to see 11 dimensional world in order to invent math to deal with string theory? Do physicist see those extra dimensions in their eyes?
Discussion here seems to be about "invention" of mathematics. I am pretty sure math does not happen without human bodies in the physical world as it is. Eg. imagine a life happening on a small gravity object travelling through space at speed approaching speed of light: how does space exploration look like and with what math?
11 dimensional vector calculus is a pretty clear example of recogniting generality between 1, 2 and 3 dimensions we can experience, and extending it to an arbitrary number.
While humans, in their mind, do construct things they did not experience, we develop tools to do that based on what we do (even things like axiom of parallelism: we experience "intersections" or lack of them, so we can ask ourselves "what if two parallel lines do intersect?").
Mathematics has really become abstract in the last 2 centuries, and was pretty tied to physical reality up to that point.
What you mean is just a dataset of records of interaction around gravity as input to a system, and math as the output. There is no necessity for a "boby" to be included in here.
And you are required to see 11 dimension to stand for your body experienceism. You are not allowed to extend linear algebra from 3D to arbitrary dimensions.
I think also interesting if you ask, what if the model consumed all information up to the point that a great discovery was made? If a model has all the information that Einstein had, could it determine that E = mc2?
This isn't what is happening though. Philosophers keep poking holes in AGI arguments, previously Strong AI, and techbros keep using a new term, each more ambiguous than the last. The hope, it seems, is to use this ambiguity to prevent pointed criticism that would prevent investment and adoption.
Article based on so many erroneous assumes, I can't believe I see it on HN!
Most important, authors don't know, that all modern AI based on Back Propagation calculations, just because they are easier to implement on cheap old hardware, but natural neurons working on Forward Propagation, which is magnitudes faster on inference.
Unfortunately, for FP we need other hardware, but it is not mean "reaching the limits of hardware scaling", it is just scaling limits for CURRENT hardware, totally other sense.
Sure, if people will play blind and avoid to see obvious things, we will have new AI winter, before somebody will reconsider FP technology.
> Billions to trillions of dollars will be poured into research over the next decade. More humans than ever are looking for breakthroughs. We have exponentially increased the parallel efforts. LLM architecture might be unable to deliver in its current state, but it has ignited monumental investments into research that might find other paths.
That's not really true, though. Neural network based approaches are funded, and among those mostly transformers and large language models. Real alternatives aren't funded that much, imo.
The next generation of large models is to train several models with different specialties (small and large), and then have a front-end for task scheduling, which is then assigned to different sub models to obtain strong capabilities and professionalism while also controlling costs?
LLMs solve for the next word. Human intelligence solves for survival with many types of input, visual, audio etc. You can't create an AGI if you don't solve for the problems that created human GI.
Digital data is all 1s and 0s, whether it encodes words, sounds, or pictures. Why do you think transformers only work for predicting words, when they're already successfully being used for other applications as well?
I think much like with a basic Turing machine definition compute is possible on a variety of substrates that some kind of intelligence can be created with a whole class of implementations, transformers included. Indeed the video and image input of LLMs is one of the most exiting use cases.
They’re trained to optimize guessing the next word. What they solve for to get this good at predicting the next word is an open question with answers hidden in plain sight in the weight blob.
Why not? For a hypothetical example - if we assume that simulating a human is AGI, and we have some hypothetical space-age magic tech bruteforce the problem by simulating every neuron and connection in the brain... why would being "embodied" factor into this?
Because I think intelligence formed in human beings is connected directly to embodiment and not some kind of abstraction that can be simulated. My guess is that the best AI developments will ultimately come from mimicking the processes of how humans learn from their environment, and not from merely simulating (or trying to simulate, as I don’t really buy the positivist approach) human brains.
Absolutely. The physical world is the input that creates the feedback loop for learning.
I would propose a definition of AGI. "A model capable of effecting the physical world through speech or physical action in a manner indistinguishable from a human."
What if 'AGI' was another over-promised scam to sell stochastic parrots marketed as "intelligence" for a product that not even its creators can even understand when it goes wrong badly?
"Oh don't worry, AGI is coming soon and we'll solve that later" - AI founders
Yet they don't even know how long that is since no-one knows or it never happens. Mistakes in AI are costly and are very expensive.
What if their startup fails before the time arrives because they still cannot make any money and need to constantly raise VC money every week or quarter?
Again, there will only be 90% - 95% of these 'AI' companies that will fail with the 5% to 10% still around including the incumbents.
This argument is, and always has been, utter bullshit.
All humans have the capacity to genuinely learn, create, and think, regardless of how their output appears to you in some subset of interactions with them.
"Some humans sometimes have trouble with critical thinking, or just regurgitate previously-memorized facts" is not in any way equivalent to "LLMs, by their fundamental nature, only have the capacity to produce various recombinations of their training data."
We cannot create life on the simplest scale, there have been experiments with the creation of life in the Miller experiment have only produced so called building blocks, amino acids.
However we are unable to create life in dead creatures that have all the building blocks in place.
What is happening is the belief that the laws of thermodynamics are probabilistic, like a law that can be broken.
Laws like gravity and thermodynamics are deterministic and the hubris of those who make claims of real intelligence in machines we create are going to be as disappointed as those who design perpetual motion machines.
I don't really understand, if physicist is right and deterministic we already have thinking machines. They're called brains. As far as anyone can tell, thermodynamics works fine for them.
It is a law that cannot be broken like perpetual motion or the creation of life.
You may also argue that the universe is a perpetual motion machine and you would be wrong.
In the grand scale, human intelligence evolved over millions of years. We went from personal computers to LLMs in mere decades. I get that everyone wants Singularity now, so do I. But there’s too much over-promise and delusion.
I think we've way over done the 'general intelligence' part of AI already, that is already 'super general intelligence'.
What's lacking is agency/autonomy. I have a bad feeling even 'general autonomy' will take a fraction of the power we're already using which means 'super autonomy'... is probably already possible.
Which means ASI soonish.. which leads to uncontrolled ASI either deliberately or accidently.. which means.. well it's out of our hands at that point. Anything can happen.
This is nonsense statement in and of itself. Its like wondering why an orange fails to turn into a chicken.
There are SO many missing pieces an LLM just doesnt have. LLMs could certainly be a small part of some sort of AGI _system_, but they themselves can never be AGI
Transformers. AGI means Artificial General Intelligence. The transformer architecture enables the transfer of knowledge to new domains in arbitrary ways, allowing for the solving of arbitrary problem domains.
We already have machines that are generally intelligent and require as much energy as a few light bulbs. Why wouldn’t it be eventually possible to replicate them in silicon?
Over a hundred years ago, Babbage might have said:
>We already have machines that are generally intelligent and require as much energy as a few coalgas lights. Why wouldn’t it be eventually possible to replicate them in brass and steam?
My understanding of the argument of this article is that the conceptual design that replicates intelligence is what the industry has failed to generate today. Simply stating that it might be possible to create is failing to engage; the massive increase in compute power from Babbage to McCarthy didn't give us AGI, because they didn't figure out the right design to reason about anything in even a hundred times a human's energy consumption. If from McCarthy to today we still haven't actually found the proper recipe it just might be worth considering the point the article's making in spite of our other advancements since then.
Because they aren't machines, we don't understand how they work and it's very much an open question whether we ever will. OpenWorm tried and failed the get the fully documented behaviours of the C. Elegans nematode (for which we know the complete connectome and the full lineage of every one of the 959 cells of its body) to emerge from simulations of its 302 neurons. We don't understand how an individual neuron works. We don't even understand what an individual neuron really does and the brain contains 100 billion of them. The hubris involved in looking at biological nervous systems and thinking "how hard can it be? Give me a big data centre and 5 years!" is staggering.
I’m specifically speaking out against the bit data center approach and only saying that it’s demonstrably possible to do general intelligence on a small, low power machine.
Probably shouldn’t have mentioned the silicon though.
These meat machines aren't necessarily computers, that may be only our best available metaphor (previous metaphors for the brain have I think included telephone exchanges, steam engines and plumbing), and although in principle silicon can simulate anything, in practice not so much: there have been various projects to simulate C. elegans, such as OpenWorm, and they fail because the worm depends on the physics of its environment (possibly including the environment inside the worm) and it's all more subtle than just its nervous system. So yeah, it's reasonable that it might eventually be possible to replicate the human brain in something, once we have the slightest clue how and what.
There are plenty of animals with brain. We weren’t able to use them for their cognitive abilities so far. So even if we would replicate the hardware, software might prove challenging.
That's exactly this 'generally intelligent' that confuses minds. People believe chat gpt has a consciousness but it does not at all. It's just a model of statistics. The first step toward AGI first of all is consciousness on silicon.
And with the current models, the only thing I see is : "let's add more neurons, give it more data to train, and let's hope a mind will come out of these neurons". It helps understand the human brain, and help humanity and all, but I'm sure we are missing some 'theorical ingredients' to get a recipe for a full working AGI.
> Why wouldn’t it be eventually possible to replicate them in silicon?
I mean if you simulate all the chemical processes in some being then yeah probably you get close (assuming you get everything right), let’s assume there is no unknown in how the atoms in our bodies interact.
Sounds like an expensive project.
If you are _not_ talking about a perfect emulation/simulation of such a machine, I will pose your question back to you: why _would_ it be possible to do something we don’t know what it is? Seems rather contrived to say “we can do this thing we don’t know what it is”.
GPT-4 Turbo: 31.0
Claude 3 Opus: 27.3
Mistral Large: 17.7
Mistral Medium: 15.3
Gemini Pro: 14.2
Qwen 1.5 72B Chat: 10.7
Claude 3 Sonnet: 7.6
GPT-3.5 Turbo: 4.2
Mixtral 8x7B Instruct: 4.2
Llama 2 70B Chat: 3.5
Qwen 1.5 14B: 3.1
Nous Hermes 2 Yi 34B: 1.5
Notes: 0-shot. Maximum possible is 100. Partial credit is given if the puzzle is not fully solved. There is only one attempt allowed per puzzle. In contrast, humans players get 4 attempts and a hint when they are one step away from solving a group. Gemini Advanced is not yet available through the API.
What I found interesting is how this benchmark reveals a large capabilities gap between the top, large models and the rest, in contrast to existing over-optimized benchmarks.