More

kardianos · 2026-05-01T13:25:48 1777641948

Gemini 2.5 and 3 can code, but they are also dumb. They don't model the world well. It's hard to use them for programming tasks.

I haven't tried grok4.2 or grok4.3 yet for coding, but it wasn't up to the challenge as an agent yet. It looks like grok4.3 shifted its training and operates always as an agent first judging on some web usage. Musk knows grok is behind and states it publically. Now with grok4.3 release I do plan to try it again to see if it is suitable.

WarmWash · 2026-05-01T13:46:09 1777643169

Gemini weakness is coding, but it will go toe to toe with 5.5 for science, (classic) engineering, finance, basically not programming stuff. It also does it while using about 1/4 the tokens.

kardianos · 2026-05-01T13:21:33 1777641693

Elon has publicly stated that he cares a great deal about safety. He has stated that the only safe models are those which align greatest with truth, that which is in reality. In this, xAI has lived up, as it has proved to hallucinate least (or close to least) in benchmarks.

If you read that, quote again, he is saying "how can you quantify safety in a card?"

Aurornis · 2026-05-01T13:44:42 1777643082

> If you read that, quote again, he is saying "how can you quantify safety in a card?"

Everyone familiar with LLM research understands what is meant by “card”.

He was being obtuse to try to dodge the question and simultaneously give performance for his fans.

neuronexmachina · 2026-05-01T13:55:23 1777643723

For model cards in general, I have a suspicion that grok's training includes a fair amount of distillation off their competitors' models. That should be disclosed in a model card, and one of the reasons they likely don't want to release one.

Barbing · 2026-05-01T17:21:28 1777656088

Fair suspicion:

  ‘Savitt asked Musk if his artificial intelligence company, xAI, had ever “distilled” technology from OpenAI. Distillation is way of using one A.I. technology to create another, and it is not allowed by OpenAI’s terms of service.

  “Generally A.I. companies distill other A.I. companies,” Musk answered.

“Is that a ‘yes’?” Savitt asked. Musk answered, “Partly.”

  Distillation has become an increasingly important issue as companies like OpenAI and Anthropic have complained that Chinese companies are distilling their systems.’

https://www.nytimes.com/live/2026/04/30/technology/openai-tr...

Barbing · 2026-05-01T17:19:02 1777655942

He knew exactly what safety card meant?

Dezvous · 2026-05-01T14:25:39 1777645539

Elon publicly states a lot of things, most of which aren't truthful.

danny_codes · 2026-05-01T15:17:14 1777648634

Sure he does. That’s why he marketed full-self driving as safe and got a bunch of people killed

WarmWash · 2026-05-01T13:43:31 1777643011

The irony that the guy who lies incessantly for years now with empty promises about his businesses is most concerned with truth...

DaiPlusPlus · 2026-05-01T21:58:49 1777672729

> Elon has publicly stated that he cares a great deal about safety

He doesn't.

https://www.theguardian.com/commentisfree/2026/jan/09/grok-u...

kccoder · 2026-05-01T19:23:28 1777663408

> Elon has publicly stated that he cares a great deal about safety.

Elon lies more often than he tells the truth; why would you believe anything he says, especially if what he is saying indicates concern for anybody else's well being? He doesn't care about other people and likely is incapable of doing so.

senordevnyc · 2026-05-01T18:06:57 1777658817

I’m stating publicly that Elon is full of shit, and doesn’t give a single dry fuck about your safety.

kardianos · 2026-04-20T14:48:06 1776696486

Yes. GLM 5.1 is that good. I don't think it is as good as Claude was in January or February of this year, but it is similar to how Claude runs now, perhaps better because I feel like it's performance is more consistent.

kardianos · 2026-04-17T12:42:29 1776429749

I hope people realize that most of these bills are being introduced in blue states by Democrats.

Republicans may not like porn, but they put the onus where it belongs, on the operator, not on the OS.

NoGravitas · 2026-04-17T13:27:58 1776432478

The federal one was introduced by Democrat Josh Gottheimer (D-NJ) and cosponsored by Republican Elise Stefanik (R-NY). This push is extremely bipartisan.

NekkoDroid · 2026-04-17T13:36:24 1776432984

> Republicans may not like porn, but they put the onus where it belongs, on the operator, not on the OS.

While that might be true, I can't agree with the implication that this is better in any way. Having the onus on the operator forces you to have to send some form of verification out to all such operators you want to visit and they have repeatedly shown they are NOT capable of securely and privately handling that information.

kgwxd · 2026-04-17T13:34:08 1776432848

Oh geez, wish I had vote for Republicans then :/

The difference isn't really in the politicians, it's in the base, and how they will react to acts like this. Democrat voters will shame them, endlessly. They may not have alternatives to vote for, but they won't change their opinion to match whatever dweeb they were forced to vote for. Republican voters will always be on board with whatever they're told to be on board with.

RajT88 · 2026-04-17T13:37:02 1776433022

> Republicans may not like porn

I am certain they love it, given what kinds of businesses see a spike when the RNC comes to town.

More accurately, restricting it is a useful policy platform that helps them win elections.

Ancapistani · 2026-04-17T16:24:09 1776443049

I can list as many reasons as you'd like to vote against Democrats, but this just isn't one of them.

If anything, the GOP is worse on this issue.

cubefox · 2026-04-17T12:57:14 1776430634

> A bill introduced by Representative Josh Gottheimer in the House on April 13

Josh Gottheimer is indeed a Democrat.

stogot · 2026-04-17T13:25:17 1776432317

I noticed when the party name is hidden in news articles, it’s often that party as I always have to go look for the rep’s page

kardianos · 2026-04-13T20:59:24 1776113964

I continue to use gerrit explicitly because I cannot stand github reviews. Yes, in theory, make changes small. But if I'm doing larger work (like updating a vendored dep, that I still review), reviewing files is... not great... in github.

tcoff91 · 2026-04-13T21:02:24 1776114144

Most editors have some kind of way to review github PRs in your editor. VSCode has a great one. I use octo.nvim since I use neovim.

nine_k · 2026-04-13T21:39:32 1776116372

Can these tools e.g. do per-commit review? I mean, it's not the UI what's the problem (though it's not ideal), it's the whole idea of commenting the entire PR at once, partly ignoring the fact that the code in it changes with more commits pushed.

Phabricator and even Gerrit are significantly nicer.

dathanb82 · 2026-04-14T00:24:19 1776126259

Unless you have a “every commit must build” rule, why would you review commits independently? The entire PR is the change set - what’s problematic about reviewing it as such?

riffraff · 2026-04-14T05:17:13 1776143833

There's a certain set of changes which are just easier to review as stacked independent commits.

Like, you can do a change that introduced a new API and one that updates all usages.

It's just easier to review those independently.

Or, you may have workflows where you have different versions of schemas and you always keep the old ones. Then you can do two commits (copy X to X+1; update X+1) where the change is obvious, rather than seeing a single diff which is just a huge new file.

I'm sure there's more cases. It's not super common but it is convenient.

strokirk · 2026-04-15T19:41:54 1776282114

Wouldn’t it be easier to do those as stacked PRs then?

steveklabnik · 2026-04-14T00:35:11 1776126911

In stacked diffs system, each commit is expected to land cleanly, yes.

verst · 2026-04-14T04:06:51 1776139611

But isn't that why you would squash before merging your PR? If you define a rule that PRs must be squashed you would still have the per commit build.

steveklabnik · 2026-04-14T05:08:10 1776143290

Squash merge is an artifact of PRs encouraging you to add commits instead of amending them, due to GitHub not being able to show you proper interdiffs, and making comments disappear when you change a diff at that line. In that context, when you add fixup commits, sure, squashing makes sense, but the stacked diffs approach encourages you to create commits that look like you want them to look like directly, instead of requiring you to roll them up at the end.

Sebb767 · 2026-04-14T13:22:54 1776172974

> Unless you have a “every commit must build” rule, why would you review commits independently?

Security. Imagine commit #1 introduces a security vulnerability (backdoor) and the features. Then #2 introduces a non-obvious, harmless bug and closes the vulnerability introduced in #1 [0]. At some point, the bug will surface and rolling back commit #2 will be an easy fix, re-introducing your bug.

Alternatively, one of the earlier commits might, for example, contain credential dumping code. Once that commit is mainlined, CI might either automatically run on it or will be able to be run on it since it's no longer marked as unsafe PR.

[0] Think something like #1 introduces array access and #2 adds a bounds-check in a function a layer above - a reviewer with the whole context will see the bounds check and (possibly) consider it fine, but to someone rolling back a commit the necessity will not be obvious.

adityaathalye · 2026-04-14T05:27:46 1776144466

Same team, and a rare hill I'm willing to die on.

Rant incoming...

Boy do I hate Github/Lab/Bucket style code reviews with a burning passion. Who the hell loses code review history? A record of the very thing that made my code better? The "why" of it all, that I am guaranteed to forget tomorrow morning.

Nobody would be using `--force` or `--force-with-lease` as a normal part of development workflow, of their own volition, if they had read that part of the git-push manpage and been horrified (as one should be).

The magit key sequence for this abominable operation is `P "f-u"`. And every single time I am forced to do it, I read "f-u" as it ought to be read.

Rebase-push is the way to do it (patch sets in Gerrit).

Rebase-force-push is absolutely not.

You see, any development workflow inevitably has to integrate changes from at least one other branch (typically latest develop or master), without destroying change history, nor review history. Gerrit makes this trivial.

It's a bit difficult to convey exactly why I'm so rah-rah Gerrit, because it is a matter of day-to-day experience of

  - Well, a single commit of a few lines to maybe a hundred lines *is* the correct unit of code review, rebase, revert etc. Manually "Sizing PRs" to that review context size is utter BS. I have better things to do in life than to book-keep PR sizes. Make a single well-contained, revertible commit. Then keep making those. And now you have a commit history that is clean, that you can merge, bisect, and bulk-revert at will. Octopus merges are a good thing. `git-log` is *designed* to let us view changes in any sequence we wish, *including* the so-called "linear" history. `git log --online`.

  - Trivial for committer to send up reviews-preserving rebase-push responses to commit reviews (NO force-push, ever --- that's an "admin" action to *evict* / permanently wipe out disaster scenarios such as when someone accidentally commits and pushes out a plaintext secret or a giant blob of the executable of the source code etc.).

  - Fast-for-the-reviewer, per-commit, diff-based, inline-commenting code reviews.

  - The years-apart experience of being able to dig into any part of one's (immutable) software change history to offer a teaching moment to someone new to the team.

... to name a few key ones.

(edit: add point about review size)

adityaathalye · 2026-04-14T05:33:42 1776144822

Slapping this "stacked diff" business on top of something so broken as Github/lab/bucket is a concrete example of... https://en.wikipedia.org/wiki/Lipstick_on_a_pig

kardianos · 2026-04-07T12:32:54 1775565174

Another option is https://github.com/drakkan/sftpgo

This is in Go, exposes both webdav and SFTP servers, with user and admin web interfaces. You can configure remotes, then compose user space from various locations for each user, some could be local, others remote.

kardianos · 2026-03-23T14:13:44 1774275224

In general I love postgres. There are to problems with postgresql in my book: the protocol (proto3) and no great way to directly query using a different language.

The protocol has no direct in-protocol cancellation, like TDS has. TDS does this by making a framed protocol, at the application protocol level it can cancel queries. It has two variants (text and binary) and can cause fragmentation, and at the query and protocol level only supports positional parameters, no named parameters.

One a query is on the server, it doesn't support directly acting on a language mode. I don't want to go into SQL mode and create a PL/SQL proc, I just want direct PL/SQL. Can't (really) do that well. Directly returning multiple result sets (eg for a matrxi, separate rows, columns, and fields) or related queries in a single round trip is technically possible, but hard to do. So frustrating.

kardianos · 2026-03-18T15:37:52 1773848272

This could be great for larger repos.

If you couple this with an optional FUSE provider, server side user branches, and gerrit like change sets, that would be awesome.

ImGajeed76 · 2026-03-18T15:42:48 1773848568

thanks! FUSE is actually a really cool idea, hadn't thought about that. would basically let you mount a repo as a filesystem backed by postgres. server side branches and change sets are interesting too, postgres already handles concurrent access well so that could work nicely. definitely adding these to the ideas list!

kardianos · 2026-03-18T16:06:03 1773849963

I've already spun up claude to make a POC for this.

I like gerrit, but the server is such a pain to handle (java plus FS). PG would be the only server side component required, though you could have an optional review server that would act like a PG client as well.

The FUSE would be extremely nice for CI/CD for instant cloning with a local resource cache, which is much harder to do with a FS based git.

nulltrace · 2026-03-19T04:36:26 1773894986

The FUSE angle is what got me. Our monorepo takes about 90 seconds just to clone in CI, and most jobs only touch two or three packages. Shallow clone helps with history but you basically still pull the entire working tree. Something that could mount the tree and fetch files on demand would cut that to almost nothing for most pipeline steps.

ImGajeed76 · 2026-03-18T16:26:06 1773851166

kardianos · 2026-03-18T14:52:39 1773845559

By pollute, you mean "make awesome and useful", right?

You do value human utility, right?

kardianos · 2026-03-07T22:59:28 1772924368

I would say the same for dec128. I would love a standard TYPE for dec128, with maybe zero cost std lib to transform it into a mutable uint128 or a zero cost conversion to struct{uint64,int64).