Prediction: even if this requires surgery, unlocking inner thought will be used in criminal proceedings to establish guilt or attempt to be used to prove innocence. It will definitely be used unethically in military/intelligence interrogations until the law catches up.
I'm not sure if this would be able to detect the difference between truthful thoughts about actual memories, and intrusive thoughts that could give the entirely wrong impression.
Yet, they still do use lie detectors, even though the things they detect can be faked, or triggered out of personal alarm or offense. So it is entirely possible, regardless.
Intrusive thoughts is a big one. Most people report some variation of this phenomenon (myself included), and are often horrified by the thoughts or images their own mind produces, very much wanting them to go away. To be judged by that is unthinkably wrong.
It depends on your classification of effective. If it is to gather accurate information, it is ineffective. If it is to gather the justification for what you were going to do anyway, it can be most effective.
For me, even when it was first released, I considered obsolete enterprise shit. That view has not diminished as the sorry state of performance and security in that space has just reaffirmed that perception.
Something tells me that the inclusion of an HDD into the data set would have altered the interpretation of the data. Given that it’s 30 for SSD and higher for remote disk, it sounds like the default of 4 is either wrong or the “what is the right value for SSD “ isn’t measured correctly
Good idea. It's an interesting historical question - when we picked 4.0 as the default ~25 years ago, how close was is to the calculated value? I was asking that myself. Unfortunately I don't have a machine with traditional HDD in my homelab anymore, but I'll see if I can run the test somewhere.
I wouldn't be all that surprised if this was (partially) due to Postgres being less optimized back then, which might have hidden some of the random vs. sequential differences. But that's just a wild guess.
To the best of my knowledge, yes. Unfortunately the details of how it was calculated in ~2000 seem to be lost, but the person who did that described he did it like this. It's possible we forgot some important details, of course, but the intent was to use the same formula. Which is why I carefully described and published the scripts, so that other engineers can point out thinkos and suggest changes.
Sure, but OpenAI is also being disingenuous here pretending they’re operating under the same principles Anthropic is. It’s not and the things they’re comfortable with doing Anthropic said they’re not
> Speculation is that the frontier models are all below 200B parameters
Some versions of some the models are around that size, which you might hit for example with the ChatGPT auto-router.
But the frontier models are all over 1T parameters. Source: watch interview with people who have left one of the big three labs and now work at the Chinese labs and are talking about how to train 1T+ models.
Certainly not Opus. That beast feels very heavy - the coherence of longer form prose is usually a good marker, and it is able to spit 4000 words coherent short stories from a single shot.
He's running a 35B parameter model. Frontier models are well over a trillion parameters at this point. Parameters = smarts. There are 1T+ open source models (e.g. GLM5), and they're actually getting to the point of being comparable with the closed source models; but you cannot remotely run them on any hardware available to us.
Core speed/count and memory bandwidth determines your performance. Memory size determines your model size which determines your smarts. Broadly speaking.
The architecture is also important: there's a trade-off for MoE. There used to be a rough rule of thumb that a 35bxa3b model would be equivalent in smarts to an 11b dense model, give or take, but that's not been accurate for a while.
MoE is not suited for paging because it’s essentially a random expert per token. It only improves throughput because you reduce the memory bandwidth requirements for generating a token since 1/n of the weights are accessed per token (but a different 1/n on each loop).
Now shrinking them sure, but I’ve seen nothing that indicates you can just page weights in and out without cratering your performance like you would with a non MoE model
Not entirely true, it’s random access within the relevant subset of experts and since concepts are clustered you actually have a much higher probability of repeatedly accessing the same subset of experts more frequently.
It’s called mixture of experts but it’s not that concepts map cleanly or even roughly to different experts. Otherwise you wouldn’t get a new expert on every token. You have to remember these were designed to improve throughput in cloud deployments where different GPUs load an expert. There you precisely want each expert to handle randomly to improve your GPU utilization rate. I have not heard anyone training local MoE models to aid sharding.
You have two parties who want to enter into a contract and a third party unrelated to the contract that doesn’t for whatever reason. Just based on contract law and common sense the unrelated party shouldn’t have standing. Now if there’s externalities to the contract that impact that unrelated party sure, but only insofar as to get those externalities addressed.
This is not the same as a robbery which involves no contract or a willing counterparty to the robbery.
Yeah, IME, if the guests of the rental acted exactly like locals, and the units were not removed from the local housing supply (not sure how that could be), or the local housing supply was in excess to the needs of the population (not sure where that is), it would be fine.
I don’t understand why the local housing supply is privileged in your scenario. And if the local housing supply is a problem it’s one the locals created themselves so…
You believe that the local area has no standing, that's incorrect. Laws and regulations are third parties impeding on the contract all the time. Libertarians may dislike this, but it's one problem with democracy - the majority make decisions you don't like.
OP doesn’t know what he’s talking about. Creating an object per byte is insane to do if you care about performance. It’ll be fine if you do 1000 objects once or this isn’t particularly performance sensitive. That’s fine. But the GC running concurrently doesn’t change anything about that, not to mention that he’s wrong and the scavenger phase for the young generation (which is typically where you find byte arrays being processed like this) is stop the world. Certain phases of the old generation collection are concurrent but notably finalization (deleting all the objects) is also stop the world as is compaction (rearranging where the objects live).
This whole approach is going to be orders of magnitude of overhead and the GC can’t do anything because you’d still be allocating the object, setting it up, etc. Your only hope would be the JIT seeing through this kind of insanity and rewriting to elide those objects but that’s not something I’m aware AOT optimizer can do let alone a JIT engine that needs to balance generating code over fully optimal behavior.
Don’t take my word for it - write a simple benchmark to illustrate the problem. You can also look throughout the comment thread that OP is just completely combative with people who clearly know something and point out problems with his reasoning.
Even if you stop the world while you sweep the infant generation, the whole point of the infant generation is that it's tiny. Most of the memory in use is going to be in the other generations and isn't going to be swept at all: the churn will be limited to the infant generation. That's why in real usage the GC overhead is I would say around 15% (and why the collections are spaced regularly and quick enough to not be noticeable).
I've been long on JS but never heard things like this, could you please prove it by any means or at least give a valid proof to the _around 15%_ statement?
Also by saying _quick enough to not be noticeable_, what's the situation you are referring too? I thought the GC overhead will stack until it eventually affects the UI responsiveness when handling continues IO or rendering loads, as recently I have done some perf stuff for such cases and optimizing count of objects did make things better and the console definitely showed some GC improvements, you make me nerve to go back and check again.
Yeah I mean don't take my word, play around with it! Here's a simple JSFiddle that makes an iterator of 10,000,000 items, each with a step object that cannot be optimized except through efficient minor GC. Try using your browser's profiler to look at the costs of running it! My profiler says 40% of the time is spent inside `next()` and only 1% of the time is spent on minor GCs. (I used the Firefox profiler. Chrome was being weird and not showing me any data from inside the fiddle iframe).
reply