I was looking into self-hosting deekseek v4 pro since frankly cache reads are an absolute scam and they're 90% of the cost, but then I looked at the ROI and it will never pay off fast enough because the hardware will become obsolete faster even if you were running 10 token generation streams 24/7.
The napkin math resulted that renting is around 27 times cheaper than owning (not including power). I think we're really screwed when it comes to having owned access to AI unless intel comes out swinging with a c series card that has 128gb vram so we can run them in a 4x128gb configuration, but seems unlikely since nvidia has a large share in them.
This was calculated expecting around 30tok/s, of course you can get 2-5tok/s much much cheaper, but it's unusable for my workflow.
Ironically the few people not scamming you for cache reads are Deepseek.
Everyone else charges a ridiculous amount but Deepseeks API is $0.003625 / M tok.
I'm surprised no one talks about this because of how significant it is. GPT 5.5 for example costs a ridiculous $0.50 / M tok cached. It's literally almost 140 times cheaper which matters a lot for tool calls.
used to use this but have opted to do good ol' hand-rolled deployment scripts instead you will run into some niche usecase that k3sup can't express and end up having to do a post script anyway.
The photo depicts "Tank Man" which was taken on June 5, 1989 during the Tiananmen Square protests. v4-pro and v4-flash roughly answer the same way on openrouter.
It's just a 92kWh battery. There are many cars with 100kWh or more on the market already. And that's only a fraction of the energy stored in an average gas tank (upwards of 500kWh). A combustion car just loses most of that energy to heat from actual explosions. From a physics perspective, a normal car is a much bigger bomb than even the longest range EV.
Batteries are much less powerful bombs than fuel tanks, because they cannot produce a so great volume of gas.
Batteries are dangerous mainly as sources of fire that is difficult to extinguish. For instance extinguishing with water may actually cause an explosion, by gas produced by the decomposition of water.
Most lithium-based batteries are more dangerous than other batteries not because they are batteries, but because they use an organic electrolyte instead of a water-based electrolyte. So their electrolyte is a fuel, which may explode when the battery catches fire.
However, there is much less electrolyte in a battery than fuel in a fuel tank, so the volume of expanding gas during an explosion is much less.
the technology openai-sells is actually not that good for kill bots, we have boston dynamics for that. I mean to be real here, they're already better than human soldiers, deploying 100 of the doggies and letting them run loose could wipe out any fortified group.
Especially if you include things that are not normally acceptable such as suicide bombers, poison gas, etc.
Also it has been proven that in real modern warfare cheap drones seem to dominate. So unless we have a kill-bot that can withstand explosives while staying lightweight and operable with good KD (drones are 1.0 or less). kill-bots would have to have a KD of 100 to break even.
Seems completely backwards to me. This is like judging Formula 1 just by the raw power of the engine. The rest of the car has just as much engineering, if not more.
Maybe there is a complex relationship between harness, model and the emergent perceived intelligence we just can't access by isolating the model alone to evaluate "raw intelligence". I don't think it's absurd to imagine a model that by itself wouldn't be that impressive, but would outperform other models given the right harness. It's also not absurd to think of a model that has incredible raw intelligence, but would not scale much with different harnesses. Model performance given different scenarios depend a LOT on dataset and training strategies, so we need to account for these complex relationships, otherwise measuring "raw intelligence" would be the next AI benchmark that is purely for show.
it can, it just has to be within the same 'session', but it's mostly limited to scratch notes afaik since there's no python or bash, yah if there's no way to execute code there's no real way to build a harness.
I run agents en-masse and they've deleted my database at least a dozen times I just don't really care since I always run agents on a snapshot basis, what that means is that agents work on a snapshot of a database that needs to be reconciled which often makes the agent realize "wait that would delete all of the data".
Telling the agents what the (sensitive) action will result in is how you avoid such issues, but you shouldn't be running agents with production data anyway.
But because people will continue to do so, explaining to the agent what the command will do is the way forward.
The napkin math resulted that renting is around 27 times cheaper than owning (not including power). I think we're really screwed when it comes to having owned access to AI unless intel comes out swinging with a c series card that has 128gb vram so we can run them in a 4x128gb configuration, but seems unlikely since nvidia has a large share in them.
This was calculated expecting around 30tok/s, of course you can get 2-5tok/s much much cheaper, but it's unusable for my workflow.
reply