More

dinobones · 2026-03-10T17:02:36 1773162156

why tho? it's just an alternate alphabet/set of symbols.

dnhkng · 2026-03-10T17:32:26 1773163946

Because its generally expected that models only work 'in distribution', i.e. they work on stuff they have previously seen.

They almost certainly have never seen regular conversations in Base64 in their training set, so its weird that it 'just works'.

Does that make sense?

fweimer · 2026-03-10T21:01:44 1773176504

If you do not properly MIME-decode email, you end up with at least some base64-encoded conversations.

dormento · 2026-03-10T17:36:01 1773164161

For all we know, AI tech companies could theoretically have converted all of the "acquired" (ahem!) training set material into base64 and used it for training as well, just like you would encode say japanese romaji or hebrew written in the english alphabet.

dtj1123 · 2026-03-10T18:18:03 1773166683

Unlikely that every company would have bothered to do this.

idiotsecant · 2026-03-10T19:00:55 1773169255

'Yes, I know we already trained on all that data, but now I want you to convert to base64 and train it again! at enormous cost!'

adcoleman6 · 2026-03-11T12:34:49 1773232489

On the contrary, it could be a deliberate attempt to augment or diversify the dataset.

gwern · 2026-03-11T01:49:46 1773193786

> They almost certainly have never seen regular conversations in Base64 in their training set, so its weird that it 'just works'.

People use Base64 to store payloads of many arbitrary things, including web pages or screenshots, both deliberately and erroneously, and so they have almost certainly seen regular conversations in Base64 in their 10tb+ text training sets scraped from billions of web pages and files and mangled emails etc.

dnhkng · 2026-03-11T06:41:21 1773211281

Yes, thats true.

But that points again to the main idea: The model has learnt to transform Base64 into a form it can already use in the 'regular' thinking structures.

The alternative is that there is an entire parallel structure just for Base64, which based on my 'chats' with LLMs in that format seems implausible; it acts like the regular model.

If there is a 'translation' organ in the model, why not a math or emotion processing organs? Thats what I set out to find, and are illustrated in the heatmaps.

Also, any writing tips from the Master blogger himself? Huge fan (squeal!)

dinobones · 2026-02-12T23:58:53 1770940733

> And given that in Austin they just reached parity with Waymo (i.e. completely unsupervised robotaxi service), they are not doing badly.

There is no unsupervised robotaxi service in Austin and there won't be, for years, if ever. Just like the way "FSD" is not fully self driving and likely never will be.

llboston · 2026-02-13T00:37:58 1770943078

According to https://robotaxitracker.com/ there are 7 unsupervised robotaxi in Austin right now.

DerekL · 2026-02-13T01:17:46 1770945466

Are these the cars where the safety driver is in a car tailing the robotaxi, or do they actually run without the need for a safety driver?

https://electrek.co/2026/01/22/tesla-didnt-remove-the-robota...

WarmWash · 2026-02-13T01:55:34 1770947734

It seems they run without a safety driver or follow car (mostly?).

However the area it operates is extremely small, and they are still only allowing Tesla bros to try it.

senordevnyc · 2026-02-13T02:10:11 1770948611

So in other words, like literally every other word out of Elon’s mouth for a decade now, it’s incredibly dishonest. He lies about everything, all the time, without any acknowledgment. Nothing is ever delivered on time, most of it isn’t delivered at all, and virtually every bit of promised capability is exaggerated.

Why does anyone want to do business with a person or company like that? I genuinely do not understand.

small_model · 2026-02-13T11:16:41 1770981401

Nope is open to the public and covers bigger area then Waymo. EDS is limiting a lot of people here's ability to critically evaluate the current autonomous auto rollout.

WarmWash · 2026-02-13T14:26:09 1770992769

The unsupervised area is a tiny subset of the supervised area.

small_model · 2026-02-13T15:47:34 1770997654

Any evidence of this?, even it its true right now, and they are being ultra cautious, (they are hardly going to just dump 100k unsupervised teslas in one week), it won't stay that way for long. They will overtake Waymo in a few months, then kill them by the end of the year.

WarmWash · 2026-02-13T18:35:11 1771007711

https://www.reddit.com/r/SelfDrivingCars/comments/1qvf7hu/te...

It's unsurprising someone so out of the loop on its true status is so hyped on it's future...

If robotaxi was doing legit rides, Elon would be posting about it 20 times a day.

small_model · 2026-02-13T19:06:31 1771009591

Reddit is extremely anti Musk and Tesla, that sub reddit being the typical example. Let's just wait and see where we are in 6 months.

senordevnyc · 2026-02-13T13:32:53 1770989573

This is totally false. If there are any truly autonomous robotaxis in Austin (a bit if, since Tesla has repeatedly lied and faked things like this in the past), it’s only a handful and they’re limited to a tiny area. The “robotaxis” with a safety driver are the ones that have the bigger area, probably because Tesla sucks at actual self-driving. Still. After a decade of broken promises and shitty engineering practices.

Elon has been blatantly lying about FSD for years, and yet the fans still take whatever he says as gospel. And yet the skeptics are the ones with EDS? lol, ok.

testing22321 · 2026-02-13T05:38:07 1770961087

> and there won't be, for years, if ever.

That is a lot of confidence. Do you work in the autonomous vehicle space?

What makes you so certain?

UltraSane · 2026-02-13T05:55:40 1770962140

Because camera only simply won't be reliable enough with current technology.

red75prime · 2026-02-13T08:09:36 1770970176

Try to find a single ablation study of a sensor suite. Waymo is in a good position to do such a study and the corporation would have benefited from showing that vision-only systems aren't viable (by demonstrating the corporation's good will to maintain public safety and by making it harder for vision-only competitors), but no such study from them.

I guess they understand that computer vision is a fast-moving target and their paper might become obsolete the next day.

UltraSane · 2026-02-13T17:46:42 1771004802

FSD and Robotaxi are plenty of evidence vision only aren't viable.

red75prime · 2026-02-13T18:47:02 1771008422

Read Electrek articles with a mouthful of salt. Fred Lambert’s “robotaxi is 10x worse than a human” estimate is based on his personal statistical reasoning, which somehow arrived at 200,000 miles per accident for humans. Minor accidents that Tesla reports for robotaxis (such as low-speed collisions with stationary objects) do not make it into publicly available statistics, so his estimate might be significantly off.

UltraSane · 2026-02-13T22:39:37 1771022377

Not a single waymo requires a "safety driver" and the self driving never disengages the way it does on Teslas.

red75prime · 2026-02-14T09:49:06 1771062546

Waymo routinely uses safety drivers, sorry, "autonomous specialists" when expanding to new cities[1][2]. Waymo cars occasionally contact the remote support. If support is not available, the cars just stay where they stopped[3].

Tesla has rolled out a small number of cars with no safety driver[4].

In short, you are either grossly misinformed or intentionally lying. Is it a political echochamber you are stuck in?

[1] https://waymo.com/faq/ "Our vehicles are primarily driving autonomously, but you’ll sometimes notice that our cars have autonomous specialists riding in the driver’s seat. These specialists are there to monitor our autonomous driving technology and share important feedback to help us improve the Waymo experience."

[2] https://waymo.com/waymo-in-uk/ "Our autonomous specialists who are present in the vehicle during testing are highly trained professionals."

[3] https://www.bbc.com/news/articles/c36zdxl41jro

[4] https://youtu.be/03e5ixbXIa4

UltraSane · 2026-02-14T19:25:21 1771097121

You are grossly misinformed. Waymo self driving never disengages the way Tesla FSD does. It is active at all times. In novel situations humans will provide instructions on what path to take but this is relatively infrequent. Tesla Robotaxis are so bad they need a safety driver in every single car at all times ready to take control when the car does something stupid. The small number of robotaxis without safety drives are limited to a tiny area and not open to the public.

Waymo works while robotaxi doesn't.

red75prime · 2026-02-15T10:55:59 1771152959

> Waymo self driving never disengages the way Tesla FSD does. It is active at all times

Consumer version of FSD can park a car if a driver doesn't take contol[1]. Waymo seems to require a remote command to initiate parking instead of just standing there with hazard lights on[2].

> Tesla Robotaxis are so bad they need a safety driver in every single car at all times ready to take control

Every single robotaxi in Austin doesn't have a driver behind the wheel. So a driver can't be ready to take control. Stop lying. I no longer believe that you are misinformed.

[1] https://youtu.be/VU3i1Pgk4M0?t=1460

[2] https://waymo.com/blog/2025/12/autonomously-navigating-the-r... "We directed our fleet to pull over and park appropriately"

UltraSane · 2026-02-17T20:45:11 1771361111

"Every single robotaxi in Austin doesn't have a driver behind the wheel."

Almost ALL of them do. And NO waymo's need them.

https://electrek.co/2026/02/17/tesla-robotaxi-adds-5-more-cr...

Robotaxis is checkers compared to Waymo Chesss

testing22321 · 2026-02-13T14:30:21 1770993021

Technology is moving fast.

When do you think it will be reliable enough?

UltraSane · 2026-02-13T17:50:48 1771005048

Not for a very long time. Just think about how big of an advantage lidar and radar are at night or radar is in snow and rain?

If Tesla had been smart they would have used regular cameras and event based cameras where the pixels send a signal whenever their brightness changes enough. These can have microsecond latency. And multi spectral cameras. Combined this data would provide very rich data for neural networks.

testing22321 · 2026-02-13T19:02:14 1771009334

Sounds like you’re an expert. Do you work in the autonomous vehicle space? In what capacity?

UltraSane · 2026-02-13T22:41:04 1771022464

I'm not an expert, just someone who understand how these technologies work. Sensor fusion is a fascinating thing.

small_model · 2026-02-13T11:14:40 1770981280

lol its running now and growing every day, the thing about Tesla's solution is it works globally and the costs are much much less than Waymo will ever be able to achieve (Given there reliance on third parties for most of the hardware) Waymo and uber will be gone in a year.

youarentrightjr · 2026-02-13T16:40:36 1771000836

> lol its running now and growing every day, the thing about Tesla's solution is it works globally and the costs are much much less than Waymo will ever be able to achieve (Given there reliance on third parties for most of the hardware) Waymo and uber will be gone in a year.

A year? They'll be gone in two weeks!

Seriously, what portion of your financial and emotional net worth is tied up in TSLA?

small_model · 2026-02-13T16:48:56 1771001336

None, it's just obvious to anyone who has a high school level of business knowledge.

youarentrightjr · 2026-02-13T17:26:53 1771003613

> None, it's just obvious to anyone who has a high school level of business knowledge.

That's a highly ironic statement given your position on "cost per mile".

With a small amount of business acumen, you'd know that betting on technology staying expensive is a bad idea. This is seen in all industries, but especially electronics, where there are many competitors continuously optimizing for cost. E.g., we're at the point now where an internet enabled phone is basically disposable, costing people ~ a few hours of wages.

History has shown that technology costs decrease over time, and rapidly if it's a critically important technology. If you don't agree, share a counter example.

small_model · 2026-02-13T19:10:33 1771009833

Phones were about $400-500 years ago now they are over $1k which is not 'a few hours of wages' well not for most of us. I agree technology prices decreases over time but Waymo is starting at 5x the cost, by the time a Waymo costs even the same price as a Model Y, let alone a Cybercab it will be too late. That's my prediction, I could be wrong though, maybe Elon and Tesla are lying and so are all the users of least version of FSD.

youarentrightjr · 2026-02-14T21:30:18 1771104618

> Phones were about $400-500 years ago now they are over $1k

https://www.walmart.com/ip/ST-MOTOROLA-XT2413V-CDMA-LTE-BLUE...

Try to avoid cherry picking if you want to have a discussion where you or the other person learns something. All the Elon stans on this site that I've encountered are highly disingenuous, starting to think that's not a coincidence.

senordevnyc · 2026-02-13T13:32:08 1770989528

Been hearing this for years now. But sure, any day now…

dinobones · 2026-02-03T20:03:30 1770149010

Can you blame them for existing during early globalization, before over the financialization of everything? It's not like they actively took more than they "should have" from anyone directly, it's a consequence of their local economy and where it was at the time.

darth_avocado · 2026-02-03T20:13:36 1770149616

> It's not like they actively took more than they "should have" from anyone directly

And who do you think exactly contributed to the over financialization of everything? Every single thing, good or bad, is a direct result of the actions of the generation before. We can thank them for creating a world where women get to vote but also criticize them for creating a world where everything costs a million dollars and all young people can earn is pennies. At any point in time they could’ve been like “this may not in my selfish interest, but it will ensure the future generations can have the same life as i do” and pushed for policies accordingly. But that didn’t happen.

dinobones · 2026-02-03T20:25:44 1770150344

Has any society ever behaved that way? It's already a push to get people to think of the middle/lower classes during the present.

I understand the desire to find an entity or group of people to blame, but they were acting in their own self interest at a peak time, they didn't know the party would be over soon, for many of them, it still isn't.

palmotea · 2026-02-03T20:24:18 1770150258

> And who do you think exactly contributed to the over financialization of everything? Every single thing, good or bad, is a direct result of the actions of the generation before.

Some elements of the generation before. It's is exceedingly unhelpful the blame an entire generation for the actions of a few. There were some elite people with a plan, many more who bought the propaganda they were served, and a lot who had nothing to do with any of it.

Also, it's worth noting (to help build empathy) that you and me likely have been suckered by propaganda for things that the next generation will curse us for, but we just think we're being sensible and informed.

The least you could do is blame an ideological faction of that generation (e.g. neoliberals), rather than blaming the whole generation itself. Among many advantages, that names the problem in a way that can solve it.

darth_avocado · 2026-02-03T22:02:45 1770156165

> It's is exceedingly unhelpful the blame an entire generation for the actions of a few.

The unfortunate reality is that every generation has the power to change things if they want to. Shifting the blame to the actions of the few is an easy way to absolve yourself of the blame. Who allows the few to take those actions? How did those few come into power to be able to take those actions? Once the actions were taken, why were they not corrected if the entire generation disagreed with them?

Maybe in the future the generations will blame my generation for a bunch of wrongs, even if I personally may not have contributed to those wrongs, I will still share the burden of not doing enough to prevent it.

palmotea · 2026-02-03T23:07:32 1770160052

> The unfortunate reality is that every generation has the power to change things if they want to.

That's an illusion. I think what you're really doing is putting unreasonable demands on the entire baby boomer generation, then blaming them for not succeeding at an impossible task. I mean, seriously, you really think, say, some boomer factory worker in Ohio is to blame for not foreseeing the effects of some 1980-era policy on 2026 or even 2006? They didn't have the benefit of the hindsight that we have.

It sounds like you're really holding tight onto blame, but what good does that do you? It solves no problems, and at best, alienates people from you.

_DeadFred_ · 2026-02-04T18:56:09 1770231369

Yes. The effects of 1980s policy was talked about endlessly and everywhere, to the point childhood me understood the coming effects. I used to joke to my parents my generation was going to create old people homes attached to factories to make them pay us back.

dotdi · 2026-02-03T20:18:27 1770149907

You _can_ blame them for several high-impact things they willingly did or at least supported, e.g. benefiting greatly from public spending yet successively voting to restrict it later on; f*cking over the real estate market and squeezing younger generations with extreme rents/prices; refusing any kind of social reforms while it has been obvious for decades that current models don't scale; decoupling of productivity from wages; and last but not least racking up huge carbon debt that later generations will pay dearly for.

kridsdale1 · 2026-02-03T20:10:32 1770149432

They didn’t passively exist during it. They implemented it. They are culpable.

dinobones · 2026-02-03T20:29:18 1770150558

There are 67 million baby boomers in the US. How can you rationally blame them all? Roughly 20% of the population.

Saying the "boomers ruined everything" is not sophisticated, we can't move forward from a blame game, we have to diagnose the actions and actors that implemented them, but of course this is much more challenging.

Ancedotally, I know plenty of poor boomers. Have you seen who works at a Dollar Tree lately?

The popular dialogue that boomer=rich and greedy, millennial=poor and exploited is not productive, it's a fabricated generational war that distracts us from the real issues.

S_Bear · 2026-02-03T21:43:34 1770155014

My parents are poor boomers, but if they had to live as I do, they'd be rich boomers. They have no financial discipline and burned through cash like crazy. If they would have saved even a little bit in the 80s and 90s, they'd be in a much better situation.

dinobones · 2026-01-20T19:28:17 1768937297

It's not that hard to notice this, just google "{university} {degree} syllabus" and you can see all the courses that the student will take.

In my case, I have CS degree and work as SWE but I probably would've been fine with just my Data Structures & Algos course as I already had programming experience.

Are computational theory, circuits 101, discrete math, logic 101, etc necessary for being a good SWE? Probably not, but they do probably expand your mind a bit.

dinobones · 2026-01-06T06:20:41 1767680441

A brief history of programming:

1. Punch cards -> Assembly languages

2. Assembly languages -> Compiled languages

3. Compiled languages -> Interpreted languages

4. Interpreted languages -> Agentic LLM prompting

I've tried the latest and greatest agentic CLI and toolings with the public SOTA models.

I think this is a productivity jump equivalent to maybe punch cards -> compiled languages, and that's it. Something like a 40% increase, but nowhere close to exponential.

defrost · 2026-01-06T06:23:11 1767680591

  1. Punch cards -> Assembly languages

Err, in my direct experience it was Punch Cards -> FORTAN

Here, for example, is the Punch Card for a single FORTRAN statement: https://en.wikipedia.org/wiki/File:FortranCardPROJ039.agr.jp...

PunchCards were an input technology, they were in no way limited to either assembley languages or to FORTRAN.

You might be thinking of programming in assembly via switch flipping or plug jacking.

asadotzler · 2026-01-06T14:04:36 1767708276

They're simply bluffing, and you called them on it. Thanks for your service. Too many people think they can just bullshit and bluff their way along and need to be taken down a peg, or for repeat offenders, shunned and ostracized.

PunchyHamster · 2026-01-06T06:50:50 1767682250

That's jump if you are a junior. It falls down hard for the seniors doing more complex stuff.

I'm also reminding that we tried whole "make it look like human language" with COBOL and it turned out that language wasn't a bottleneck, the ability of people to specify exactly what they want was the bottleneck. Once you have exact spec, even writing code on your own isn't all that hard but extracting that from stakeolders have always been the harder part of the programming.

ThrowawayR2 · 2026-01-06T06:34:00 1767681240

Except punch cards are a data storage format, not a language. Some of the the earliest computers were programmed by plugboard ( https://en.wikipedia.org/wiki/Plugboard#Early_computers ) so that might be thought of as a precursor to machine language / assembly language.

And compiled and interpreted languages evolved alongside each other in the 1950s-1970s.

dinobones · 2026-01-01T21:36:07 1767303367

I used the early web. I miss forums, I miss the small webmaster, I miss making fun, small websites to share with friends.

And while you could make the argument that these forms of media were superior to TikTok, I’d also argue that this is mostly just taste.

While we have closed ecosystems now, they’re much easier to make and share content to than the web of the past. It’s much easier to get distribution and go viral. There’s also a well trodden path to monetization so that if you craft great content people love, you can make a living from it.

Yeah quirky designs, guestbooks, affiliate badges, page counters, all that stuff. I miss it. But only ever a very small fraction of society was going to be able to make and consume that stuff.

This new internet is much more accessible and it occasionally produces diamonds of culture, you just have to know where to look.

So no, I don’t think any amount of decentralized protocols or tooling or any technology really can change this. I think this trend is set and will continue, and I’ve had to learn to be more open minded to how I perceive internet content.

No one is going to make personal websites or change their behavior in a major way.

Look, you can still sign up for free web hosting and make an HTML page and tell your friends. There are still people that do this. But it’s naturally eclipsed by these other methods of much easier content sharing.

The point is the content itself, not the packaging. Just get over the shape of the packaging and enjoy.

basscomm · 2026-01-01T21:39:53 1767303593

> I miss making fun, small websites to share with friends.

You can still do that right now. I highly recommend it.

rchaud · 2026-01-02T02:11:44 1767319904

Precisely. I have made my own e-cards to send to friends to commemorate holidays and outings. All HTML + CSS, responsive and looks fine on all devices.

PaulDavisThe1st · 2026-01-02T01:39:20 1767317960

> I used the early web. I miss forums, I miss the small webmaster, I miss making fun, small websites to share with friends.

None of these things are gone. They're just not new anymore for a lot more people, and they probably have significantly less social impact and cachet. But that's all.

notahacker · 2026-01-02T19:24:31 1767381871

Yeah. It's actually the opposite: the original web considering of home pages and niche forums participated in exclusively by actual humans is very much still there, even if sometimes in different places.

It's "web 2.0" consisting of centralised networks full of your friends and friends of friends posting photos, updates and invitations that's being killed by those networks promoting "engagement bait" and generated content and bot accounts

dinobones · 2025-12-24T04:24:21 1766550261

You spent 3 months on this hacked together garbage when you probably could’ve just configured a pre-existing solution off the shelf with like 10 minutes of reading and understanding documentation.

This blog post reeks of “you can just do things” type of engineering. This is the quality of engineering I would expect from “TPOT” (that part of Twitter) where people talk about working 12 hour days. It’s cause they’re working 12 hours on bullshit like this.

Building some sweet custom codec or binary transportation algorithm was barely cute in like 1989. It definitely ain’t cute now.

How many of these AI and “agentic” companies are just misled engineers thinking they are cracked and writing needlessly complex solutions to problems that dont even exist?

Just burn it all down. Let it pop already.

krater23 · 2025-12-24T14:48:45 1766587725

Thanks! Exactly what I think about their work and their idea to people watching AI agents to code.

dinobones · 2025-12-12T23:08:03 1765580883

The year is 2034. Countless attempts at re-producing the sophisticated wetware of the brain have failed. Modeling research has proved unfruitful, with the curse of dimensionality afflicting every attempt at breaking the walls of general intelligence. With only a few million of capital left, and facing bankruptcy, they knew that only one option remained.

"Bring me the rats."

leoc · 2025-12-13T00:05:42 1765584342

Douglas Adams would point out that this is just why the rats already trained us to play DOOM.

sadeshmukh · 2025-12-13T01:10:03 1765588203

The mice, actually; the rats are never mentioned.

leoc · 2025-12-13T01:47:57 1765590477

That I remember, but I have to work with the material I am given …

k0ba · 2025-12-12T23:39:24 1765582764

finally somebody gets it.

dinobones · 2025-12-11T19:19:01 1765480741

It's becoming challenging to really evaluate models.

The amount of intelligence that you can display within a single prompt, the riddles, the puzzles, they've all been solved or are mostly trivial to reasoners.

Now you have to drive a model for a few days to really get a decent understanding of how good it really is. In my experience, while Sonnet/Opus may not have always been leading on benchmarks, they have always *felt* the best to me, but it's hard to put into words why exactly I feel that way, but I can just feel it.

The way you can just feel when someone you're having a conversation with is deeply understanding you, somewhat understanding you, or maybe not understanding at all. But you don't have a quantifiable metric for this.

This is a strange, weird territory, and I don't know the path forward. We know we're definitely not at AGI.

And we know if you use these models for long-horizon tasks they fail at some point and just go off the rails.

I've tried using Codex with max reasoning for doing PRs and gotten laughable results too many times, but Codex with Max reasoning is apparently near-SOTA on code. And to be fair, Claude Code/Opus is also sometimes equally as bad at doing these types of "implement idea in big codebase, make changes too many files, still pass tests" type of tasks.

Is the solution that we start to evaluate LLMs on more long-horizon tasks? I think to some degree this was the spirit of SWE Verified right? But even that is being saturated now.

Libidinalecon · 2025-12-12T00:17:00 1765498620

Totally agree. I just got a free trial month I guess to try to bring me back to chatGPT but I don't really know what to ask it to display if it is on par with Gemini.

I really have a sinking feel right now actually of what an absolute giant waste of capital all this is.

I am glad for all the venture capital behind all this to subsidize my intellectual noodlings on a super computer but my god what have we done?

This is so much fun but this doesn't feel like we are getting closer to "AGI" after using Gemini for about 100 hours or so now. The first day maybe but not now when you see how off it can still be all the time.

ACCount37 · 2025-12-11T19:35:56 1765481756

The good old "benchmarks just keep saturating" problem.

Anthropic is genuinely one of the top companies in the field, and for a reason. Opus consistently punches above its weight, and this is only in part due to the lack of OpenAI's atrocious personality tuning.

Yes, the next stop for AI is: increasing task length horizon, improving agentic behavior. The "raw general intelligence" component in bleeding edge LLMs is far outpacing the "executive function", clearly.

imiric · 2025-12-11T20:06:20 1765483580

Shouldn't the next stop be to improve general accuracy, which is what these tools have struggled with since their inception? Until when are "AI" companies going to offload the responsibility on the user to verify the output of their tools?

Optimizing for benchmark scores, which are highly gamed to begin with, by throwing more resources at this problem is exceedingly tiring. Surely they must've noticed the performance plateau and diminishing returns of this approach by now, yet every new announcement is the same.

ACCount37 · 2025-12-11T20:38:51 1765485531

What "performance plateau"? The "plateau" disappears the moment you get harder unsaturated benchmarks.

It's getting more and more challenging to do that - just not because the models don't improve. Quite the opposite.

Framing "improve general accuracy" as "something no one is doing" is really weird too.

You need "general accuracy" for agentic behavior to work at all. If you have a simple ten step plan, and each step has a 50% chance of an unrecoverable failure, then your plan is fucked, full stop. To advance on those benchmarks, the LLM has to fail less and recover better.

Hallucinations is a "solvable but very hard to solve" problem. Considerable progress is being made on it, but if there's "this one weird trick" that deletes hallucinations, then we sure didn't find it yet. Humans get a body of meta-knowledge for free, which lets them dodge hallucinations decently well (not perfectly) if they want to. LLMs get pathetic crumbs of meta-knowledge and little skill in using it. Room for improvement, but, not trivial to improve.

dinobones · 2025-11-25T23:41:03 1764114063

I stopped listening to Lex Fridman after he tried to arbiter a "peace agreement" between Russia and Ukraine and claimed he just wanted to make the world "love" each other more.

Then I found out he was a fraud that had no academic connection to MIT other than working there as an IC.

cheema33 · 2025-11-26T00:42:50 1764117770

> I stopped listening to Lex Fridman after he tried to arbiter a "peace agreement" between Russia and Ukraine...

Same here. I lost all respect for Lex after seeing him interview Zelensky of Ukraine. Lex grew up in Moscow. He sometimes shows a soft spot for Russia perhaps because of it.