Hacker Newsnew | past | comments | ask | show | jobs | submit | vlovich123's commentslogin

Believe it or not Apple has no say about this

Apple recently had an issue with expired certs they had to remedy. That tends to be their bottleneck now.

Yeah that just tripped me up trying to recomission a 2012 Macbook Pro.

Couldn't connect to wifi except through a password-less hotspot. Then I couldnt get online because nothing with SSL was working.

I didnt have a pen drive so I had to FTP off another machine, via my phone hotspot. We got there though!


Prediction: even if this requires surgery, unlocking inner thought will be used in criminal proceedings to establish guilt or attempt to be used to prove innocence. It will definitely be used unethically in military/intelligence interrogations until the law catches up.

I'm not sure if this would be able to detect the difference between truthful thoughts about actual memories, and intrusive thoughts that could give the entirely wrong impression.

Yet, they still do use lie detectors, even though the things they detect can be faked, or triggered out of personal alarm or offense. So it is entirely possible, regardless.


Intrusive thoughts is a big one. Most people report some variation of this phenomenon (myself included), and are often horrified by the thoughts or images their own mind produces, very much wanting them to go away. To be judged by that is unthinkably wrong.

torture not being that effective has never stopped the US government before

It depends on your classification of effective. If it is to gather accurate information, it is ineffective. If it is to gather the justification for what you were going to do anyway, it can be most effective.

Yeah why do that when the government can just “get” someone’s google search history?

The worst: ads.

Noooo. Makes me wonder how much money do you need to buy up all the ad slots in the world and replace them with blanks.

"Hit him with this $5 wrench until he tells us the password" XKCD 538

We normally do not accept people being hit with wrenches (or a contextual contemporary) in criminal justice trials.

I don't think that the brain surgery is accepted as well.

Being hit with a wrench seems less invasive and even preferable compared to mind-reading brain surgery.

Thankfully we aren't forced to pick between them, "neither" is the current status quo and will do quite nicely for the foreseeable future.

Not yet.

My first dystopic thought was immigration counters at airports /s

For me, even when it was first released, I considered obsolete enterprise shit. That view has not diminished as the sorry state of performance and security in that space has just reaffirmed that perception.

Something tells me that the inclusion of an HDD into the data set would have altered the interpretation of the data. Given that it’s 30 for SSD and higher for remote disk, it sounds like the default of 4 is either wrong or the “what is the right value for SSD “ isn’t measured correctly

Good idea. It's an interesting historical question - when we picked 4.0 as the default ~25 years ago, how close was is to the calculated value? I was asking that myself. Unfortunately I don't have a machine with traditional HDD in my homelab anymore, but I'll see if I can run the test somewhere.

I wouldn't be all that surprised if this was (partially) due to Postgres being less optimized back then, which might have hidden some of the random vs. sequential differences. But that's just a wild guess.


But also if it was calculated 25 years ago, was it the same metric you’re using today?

To the best of my knowledge, yes. Unfortunately the details of how it was calculated in ~2000 seem to be lost, but the person who did that described he did it like this. It's possible we forgot some important details, of course, but the intent was to use the same formula. Which is why I carefully described and published the scripts, so that other engineers can point out thinkos and suggest changes.

Sure, but OpenAI is also being disingenuous here pretending they’re operating under the same principles Anthropic is. It’s not and the things they’re comfortable with doing Anthropic said they’re not

The hardware difference explains runtime performance differences, not task performance.

Speculation is that the frontier models are all below 200B parameters but a 2x size difference wouldn’t fully explain task performance differences


> Speculation is that the frontier models are all below 200B parameters

Some versions of some the models are around that size, which you might hit for example with the ChatGPT auto-router.

But the frontier models are all over 1T parameters. Source: watch interview with people who have left one of the big three labs and now work at the Chinese labs and are talking about how to train 1T+ models.


> The hardware difference explains runtime performance differences, not task performance.

Yes it does.


Care to elaborate?

Certainly not Opus. That beast feels very heavy - the coherence of longer form prose is usually a good marker, and it is able to spit 4000 words coherent short stories from a single shot.

He's running a 35B parameter model. Frontier models are well over a trillion parameters at this point. Parameters = smarts. There are 1T+ open source models (e.g. GLM5), and they're actually getting to the point of being comparable with the closed source models; but you cannot remotely run them on any hardware available to us.

Core speed/count and memory bandwidth determines your performance. Memory size determines your model size which determines your smarts. Broadly speaking.


The architecture is also important: there's a trade-off for MoE. There used to be a rough rule of thumb that a 35bxa3b model would be equivalent in smarts to an 11b dense model, give or take, but that's not been accurate for a while.

> There are 1T+ open source models (e.g. GLM5),

GLM-5 is ~750B model.


Who would have thought ai labs with billions upon billions of r&d budget would have better models than a free alternative.

MoE is not suited for paging because it’s essentially a random expert per token. It only improves throughput because you reduce the memory bandwidth requirements for generating a token since 1/n of the weights are accessed per token (but a different 1/n on each loop).

Now shrinking them sure, but I’ve seen nothing that indicates you can just page weights in and out without cratering your performance like you would with a non MoE model


Not entirely true, it’s random access within the relevant subset of experts and since concepts are clustered you actually have a much higher probability of repeatedly accessing the same subset of experts more frequently.

It’s called mixture of experts but it’s not that concepts map cleanly or even roughly to different experts. Otherwise you wouldn’t get a new expert on every token. You have to remember these were designed to improve throughput in cloud deployments where different GPUs load an expert. There you precisely want each expert to handle randomly to improve your GPU utilization rate. I have not heard anyone training local MoE models to aid sharding.

is there anywhere good to read/follow to get operational clarity on this stuff?

my current system of looking for 1 in 1000 posts on HN or 1 in 100 on r/locallama is tedious.


Ask any of the models to explain this to you

That’s a false analogy.

You have two parties who want to enter into a contract and a third party unrelated to the contract that doesn’t for whatever reason. Just based on contract law and common sense the unrelated party shouldn’t have standing. Now if there’s externalities to the contract that impact that unrelated party sure, but only insofar as to get those externalities addressed.

This is not the same as a robbery which involves no contract or a willing counterparty to the robbery.


Yeah, IME, if the guests of the rental acted exactly like locals, and the units were not removed from the local housing supply (not sure how that could be), or the local housing supply was in excess to the needs of the population (not sure where that is), it would be fine.

I don’t understand why the local housing supply is privileged in your scenario. And if the local housing supply is a problem it’s one the locals created themselves so…

You believe that the local area has no standing, that's incorrect. Laws and regulations are third parties impeding on the contract all the time. Libertarians may dislike this, but it's one problem with democracy - the majority make decisions you don't like.

OP doesn’t know what he’s talking about. Creating an object per byte is insane to do if you care about performance. It’ll be fine if you do 1000 objects once or this isn’t particularly performance sensitive. That’s fine. But the GC running concurrently doesn’t change anything about that, not to mention that he’s wrong and the scavenger phase for the young generation (which is typically where you find byte arrays being processed like this) is stop the world. Certain phases of the old generation collection are concurrent but notably finalization (deleting all the objects) is also stop the world as is compaction (rearranging where the objects live).

This whole approach is going to be orders of magnitude of overhead and the GC can’t do anything because you’d still be allocating the object, setting it up, etc. Your only hope would be the JIT seeing through this kind of insanity and rewriting to elide those objects but that’s not something I’m aware AOT optimizer can do let alone a JIT engine that needs to balance generating code over fully optimal behavior.

Don’t take my word for it - write a simple benchmark to illustrate the problem. You can also look throughout the comment thread that OP is just completely combative with people who clearly know something and point out problems with his reasoning.


Thanks for this. I was feeling similarly reading the original post.

I was trying to keep an open mind, it's easy to be wrong with all that's going on in the industry right now.

Thanks for clarifying some of the details back to what I was originally thinking.


Even if you stop the world while you sweep the infant generation, the whole point of the infant generation is that it's tiny. Most of the memory in use is going to be in the other generations and isn't going to be swept at all: the churn will be limited to the infant generation. That's why in real usage the GC overhead is I would say around 15% (and why the collections are spaced regularly and quick enough to not be noticeable).

I've been long on JS but never heard things like this, could you please prove it by any means or at least give a valid proof to the _around 15%_ statement? Also by saying _quick enough to not be noticeable_, what's the situation you are referring too? I thought the GC overhead will stack until it eventually affects the UI responsiveness when handling continues IO or rendering loads, as recently I have done some perf stuff for such cases and optimizing count of objects did make things better and the console definitely showed some GC improvements, you make me nerve to go back and check again.

Yeah I mean don't take my word, play around with it! Here's a simple JSFiddle that makes an iterator of 10,000,000 items, each with a step object that cannot be optimized except through efficient minor GC. Try using your browser's profiler to look at the costs of running it! My profiler says 40% of the time is spent inside `next()` and only 1% of the time is spent on minor GCs. (I used the Firefox profiler. Chrome was being weird and not showing me any data from inside the fiddle iframe).

JSFiddle link missing.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: