Hacker Newsnew | past | comments | ask | show | jobs | submit | evilduck's commentslogin

https://en.wikipedia.org/wiki/Psyonix

Rocket League was a sequel to Super Sonic Rocket Powered Battle Cars which was a totally new game but born from the studio building VehicleMod for Unreal Tournament.


We are talking about hit games. Mods previously made by people who released a hit game are out of scope.

We're talking about hit games created specifically as a sequel to a hit mod of another game, and communication to the community of the hit mod that this is where the developers are going, and that they should move to the standalone game if they want to thank the developers for all that unpaid work they did on the mod over the years.

Can a benchmark meant as a joke not use a fun interpretation of results? The Qwen result has far better style points. Fun sunglasses, a shadow, a better ground, a better sky, clouds, flowers, etc.

If we want to get nitty gritty about the details of a joke, a flamingo probably couldn't physically sit on a unicycle's seat and also reach the pedals anyways.


I just wanted to express gratitude to you guys, you do great work. However, it is a little annoying to have to redownload big models though and keeping up with the AI news and community sentiment is a full time job. I wish there was some mechanism somewhere (on your site or Huggingface or something) for displaying feedback or confidence in a model being "ready for general use" before kicking off 100+ GB model downloads.

Hey thanks - yes agreed - for now we do:

1. Split metadata into shard 0 for huge models so 10B is for chat template fixes - however sometimes fixes cause a recalculation of the imatrix, which means all quants have to be re-made

2. Add HF discussion posts on each model talking about what changed, and on our Reddit and Twitter

3. Hugging Face XET now has de-duplication downloading of shards, so generally redownloading 100GB models again should be much faster - it chunks 100GB into small chunks and hashes them, and only downloads the shards which have changed


If you would know - is this also why LM Studio and Ollama model downloads often fail with a signature mismatch error?

Probably yes

Ah thanks, I wasn't aware of #3, that should be a huge boon.

Oh yes! This only applies if one uses hf download / snapshot_download - other normal download methods sadly won't have XET

Best policy is to just wait a couple of weeks after a major model is released. It's frustrating to have to re-download tens or hundreds of GB every few days, but the quant producers have no choice but to release early and often if they want to maintain their reputation.

Ideally the labs releasing the open models would work with Unsloth and the llama.cpp maintainers in advance to work out the bugs up front. That does sometimes happen, but not always.


Yep agreed at least 1 week is a good idea :)

We do get early access to nearly all models, and we do find the most pressing issues sometimes. But sadly some issues are really hard to find and diagnose :(


They include their auditor's reports in their document, around page 100: https://www.apple.com/environment/pdf/Apple_Environmental_Pr...

Do you also distrust those?


> Do you also distrust those?

I suspect the OP made a mistake and forgot the word “not” in “I'm accusing Apple of lying, but I'd like to get more context than” (otherwise the “but” makes little sense).

I expect they are asking in good faith if there are audits, not accusing the auditors of being corrupt.


It really comes down to whether we trust Apple to do the work; auditors can be found that will certify anything you need even if not at the fraud levels of Arthur Anderson.

And this kind of thing can be hard to independently verify.

Given Apple’s track record I suspect they actually do care about this internally and spend the effort to make sure it is “real”.


I don't expect absolute perfection from Apple but I think they are putting in good faith effort towards these improvements and are just proud of their accomplishments.

If it was strictly a feel-good PR effort then that would have the complete opposite effect if their environmental claims were found to be fabricated, and it would just take one whistleblower anywhere in their own staff, their auditing teams, or anywhere in their global supply chain to bring down that whole facade.


I agree, it seems to be something Tim Apple personally cares about and he'd not likely to want the smoke blown.

Apple has independently clamped down on suppliers without being forced to, iirc.


You also have to consider the outside intervention forcibly imposed upon Germany, after being defeated in war both times, and how the first round of that contributed directly to WWII. It's not exactly a playbook to copy verbatim.


I'm on the record that America needs a third Reconstruction era.


That describes someone with maybe an irresponsible but manageable gambling habit, not a gambling addict.

Maybe it's because of pay-at-the-pump popularity now but have you never seen someone standing off to the side of the main gas station counter surrounded by a pile of scratch offs? People exist who will drop their entire paycheck on them in a single day. I've also seen people buy irresponsibly large stacks of Powerball tickets and not just the "oh, I like to fantasize about winning so I buy a ticket each week since you can't win if you don't play". It's gambling all the same.


To be fair to your field, that advancement seems expected, no? We can do things to LLMs that we can't ethically or practically do to humans.


I'm still impressed by the progress in interpretability, I remember being quite pessimistic that we'd achieve even what we have today (and I recall that being the consensus in ML researchers at the time). In other words, while capabilities have advanced at about the pace I expected from the GPT-2/3 days, mechanistic interpretability has advanced even faster than I'd hoped for (in some ways, we are very far from completely understanding the ways LLMs work).


You can buy a full day's worth of energy storage with an array of LiFePO4 batteries for less than the typical 3% estimate of annual home improvement and maintenance costs you should be budgeting for as a homeowner. The cost problem usually comes from the labor and every solar installation company seemingly being ran by scam artists.


In terms of ability, maybe, in terms of speed, it's not even close. Check out the Prompt Processing speeds between them: https://kyuz0.github.io/amd-strix-halo-toolboxes/

gpt-oss-120b is over 600 tokens/s PP for all but one backend.

nemotron-3-super is at best 260 tokens/s PP.

Comparing token generation, it's again like 50 tokens/sec vs 15 tokens/sec

That really bogs down agentic tooling. Something needs to be categorically better to justify halving output speed, not just playing in the margins.


In my case with vLLM on dual RTX Pro 6000

gpt-oss-120b: (unknown prefill), ~175 tok/s generation. I don't remember the prefill speed but it certainly was below 10k

Nemotron-3-Super: 14070 tok/s prefill, ~194.5 tok/s generation. (Tested fresh after reload, no caching, I have a screenshot.)

Nemotron-3-Super using NVFP4 and speculative decoding via MTP 5 tokens at a time as mentioned in Nvidia cookbook: https://docs.nvidia.com/nemotron/nightly/usage-cookbook/Nemo...


Hmm you might be able to tweak the settings further. Under llama.cpp on one RTX 6000 Pro I get ~215 tok/s generation speed. The key for me was setting min_p greater than 0. My settings:

``` #!/bin/bash

llama-server \ -hf ggml-org/gpt-oss-120b-GGUF \ -c 0 \ -np 1 \ --jinja \ --no-mmap \ --temp 1.0 \ --top-p 1.0 \ --min-p 0.001 \ --chat-template-kwargs '{"reasoning_effort": "high"}' \ --host 0.0.0.0 ```


This is not even the first closed weights Qwen model.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: