> at Q4_K_M, stock-style quantization is retaining ~99–99.8% of BF16 accuracy
That's a tall claim. By that measure, even NVIDIA's QAD, which is AFAIK is currently SOTA for 4-bit quantization (albeit NVFP4 instead of INT4) would be worse than Q4_K_M RTN quantization. :D
The most salient thing about these models is that they're non-reasoning models. This makes then very token efficient and particularly well suited for local inference where decoding is usually slower than with datacenter GPUs.
One of the first bits of infosec advice I give to my non-technical friends and family, when they ask for it, is to turn off background location access for all apps on their phones.
Needless to say, I know plenty of technical people who don't care about it.
I've seen people getting fired in BigTech for using the platform to stalk their ex-es. It's usually an alert that goes off when employees access internal dashboards for a certain profile, too many times.
Yeah, OpenAI has been attaching C2PA manifests to all their generated images from the very beginning. Also, based on a small evaluation that I ran, modern ML based AI generated image detectors like OmniAID[1] seem to do quite well at detecting GPT-Image-2 generated images. I use both in an on-device AI generated image detector that I built.
Exactly, I grew up playing with BC547 and BC337s (my father was an electronics engineer) and only later found 2N2222 and 2N3904. Those were almost entirely unheard of in India.
Arguably much less successful since jellyfish have been around 700+ million years ands it’s not clear if humans will make it even the next couple thousand.
But the jury is still out on that one
That's a tall claim. By that measure, even NVIDIA's QAD, which is AFAIK is currently SOTA for 4-bit quantization (albeit NVFP4 instead of INT4) would be worse than Q4_K_M RTN quantization. :D
https://arxiv.org/abs/2601.20088
reply