Gemini 2.5 and 3 can code, but they are also dumb. They don't model the world well. It's hard to use them for programming tasks.
I haven't tried grok4.2 or grok4.3 yet for coding, but it wasn't up to the challenge as an agent yet. It looks like grok4.3 shifted its training and operates always as an agent first judging on some web usage. Musk knows grok is behind and states it publically. Now with grok4.3 release I do plan to try it again to see if it is suitable.
Gemini weakness is coding, but it will go toe to toe with 5.5 for science, (classic) engineering, finance, basically not programming stuff. It also does it while using about 1/4 the tokens.
Elon has publicly stated that he cares a great deal about safety. He has stated that the only safe models are those which align greatest with truth, that which is in reality. In this, xAI has lived up, as it has proved to hallucinate least (or close to least) in benchmarks.
If you read that, quote again, he is saying "how can you quantify safety in a card?"
For model cards in general, I have a suspicion that grok's training includes a fair amount of distillation off their competitors' models. That should be disclosed in a model card, and one of the reasons they likely don't want to release one.
‘Savitt asked Musk if his artificial intelligence company, xAI, had ever “distilled” technology from OpenAI. Distillation is way of using one A.I. technology to create another, and it is not allowed by OpenAI’s terms of service.
“Generally A.I. companies distill other A.I. companies,” Musk answered.
“Is that a ‘yes’?” Savitt asked. Musk answered, “Partly.”
Distillation has become an increasingly important issue as companies like OpenAI and Anthropic have complained that Chinese companies are distilling their systems.’
> Elon has publicly stated that he cares a great deal about safety.
Elon lies more often than he tells the truth; why would you believe anything he says, especially if what he is saying indicates concern for anybody else's well being? He doesn't care about other people and likely is incapable of doing so.
Yes. GLM 5.1 is that good. I don't think it is as good as Claude was in January or February of this year, but it is similar to how Claude runs now, perhaps better because I feel like it's performance is more consistent.
The federal one was introduced by Democrat Josh Gottheimer (D-NJ) and cosponsored by Republican Elise Stefanik (R-NY). This push is extremely bipartisan.
> Republicans may not like porn, but they put the onus where it belongs, on the operator, not on the OS.
While that might be true, I can't agree with the implication that this is better in any way. Having the onus on the operator forces you to have to send some form of verification out to all such operators you want to visit and they have repeatedly shown they are NOT capable of securely and privately handling that information.
The difference isn't really in the politicians, it's in the base, and how they will react to acts like this. Democrat voters will shame them, endlessly. They may not have alternatives to vote for, but they won't change their opinion to match whatever dweeb they were forced to vote for. Republican voters will always be on board with whatever they're told to be on board with.
I continue to use gerrit explicitly because I cannot stand github reviews. Yes, in theory, make changes small. But if I'm doing larger work (like updating a vendored dep, that I still review), reviewing files is... not great... in github.
Can these tools e.g. do per-commit review? I mean, it's not the UI what's the problem (though it's not ideal), it's the whole idea of commenting the entire PR at once, partly ignoring the fact that the code in it changes with more commits pushed.
Phabricator and even Gerrit are significantly nicer.
Unless you have a “every commit must build” rule, why would you review commits independently? The entire PR is the change set - what’s problematic about reviewing it as such?
There's a certain set of changes which are just easier to review as stacked independent commits.
Like, you can do a change that introduced a new API and one that updates all usages.
It's just easier to review those independently.
Or, you may have workflows where you have different versions of schemas and you always keep the old ones. Then you can do two commits (copy X to X+1; update X+1) where the change is obvious, rather than seeing a single diff which is just a huge new file.
I'm sure there's more cases. It's not super common but it is convenient.
Squash merge is an artifact of PRs encouraging you to add commits instead of amending them, due to GitHub not being able to show you proper interdiffs, and making comments disappear when you change a diff at that line. In that context, when you add fixup commits, sure, squashing makes sense, but the stacked diffs approach encourages you to create commits that look like you want them to look like directly, instead of requiring you to roll them up at the end.
> Unless you have a “every commit must build” rule, why would you review commits independently?
Security. Imagine commit #1 introduces a security vulnerability (backdoor) and the features. Then #2 introduces a non-obvious, harmless bug and closes the vulnerability introduced in #1 [0]. At some point, the bug will surface and rolling back commit #2 will be an easy fix, re-introducing your bug.
Alternatively, one of the earlier commits might, for example, contain credential dumping code. Once that commit is mainlined, CI might either automatically run on it or will be able to be run on it since it's no longer marked as unsafe PR.
[0] Think something like #1 introduces array access and #2 adds a bounds-check in a function a layer above - a reviewer with the whole context will see the bounds check and (possibly) consider it fine, but to someone rolling back a commit the necessity will not be obvious.
Boy do I hate Github/Lab/Bucket style code reviews with a burning passion. Who the hell loses code review history? A record of the very thing that made my code better? The "why" of it all, that I am guaranteed to forget tomorrow morning.
Nobody would be using `--force` or `--force-with-lease` as a normal part of development workflow, of their own volition, if they had read that part of the git-push manpage and been horrified (as one should be).
The magit key sequence for this abominable operation is `P "f-u"`. And every single time I am forced to do it, I read "f-u" as it ought to be read.
Rebase-push is the way to do it (patch sets in Gerrit).
Rebase-force-push is absolutely not.
You see, any development workflow inevitably has to integrate changes from at least one other branch (typically latest develop or master), without destroying change history, nor review history. Gerrit makes this trivial.
It's a bit difficult to convey exactly why I'm so rah-rah Gerrit, because it is a matter of day-to-day experience of
- Well, a single commit of a few lines to maybe a hundred lines *is* the correct unit of code review, rebase, revert etc. Manually "Sizing PRs" to that review context size is utter BS. I have better things to do in life than to book-keep PR sizes. Make a single well-contained, revertible commit. Then keep making those. And now you have a commit history that is clean, that you can merge, bisect, and bulk-revert at will. Octopus merges are a good thing. `git-log` is *designed* to let us view changes in any sequence we wish, *including* the so-called "linear" history. `git log --online`.
- Trivial for committer to send up reviews-preserving rebase-push responses to commit reviews (NO force-push, ever --- that's an "admin" action to *evict* / permanently wipe out disaster scenarios such as when someone accidentally commits and pushes out a plaintext secret or a giant blob of the executable of the source code etc.).
- Fast-for-the-reviewer, per-commit, diff-based, inline-commenting code reviews.
- The years-apart experience of being able to dig into any part of one's (immutable) software change history to offer a teaching moment to someone new to the team.
This is in Go, exposes both webdav and SFTP servers, with user and admin web interfaces. You can configure remotes, then compose user space from various locations for each user, some could be local, others remote.
In general I love postgres. There are to problems with postgresql in my book: the protocol (proto3) and no great way to directly query using a different language.
The protocol has no direct in-protocol cancellation, like TDS has. TDS does this by making a framed protocol, at the application protocol level it can cancel queries. It has two variants (text and binary) and can cause fragmentation, and at the query and protocol level only supports positional parameters, no named parameters.
One a query is on the server, it doesn't support directly acting on a language mode. I don't want to go into SQL mode and create a PL/SQL proc, I just want direct PL/SQL. Can't (really) do that well. Directly returning multiple result sets (eg for a matrxi, separate rows, columns, and fields) or related queries in a single round trip is technically possible, but hard to do. So frustrating.
thanks! FUSE is actually a really cool idea, hadn't thought about that. would basically let you mount a repo as a filesystem backed by postgres. server side branches and change sets are interesting too, postgres already handles concurrent access well so that could work nicely. definitely adding these to the ideas list!
I've already spun up claude to make a POC for this.
I like gerrit, but the server is such a pain to handle (java plus FS). PG would be the only server side component required, though you could have an optional review server that would act like a PG client as well.
The FUSE would be extremely nice for CI/CD for instant cloning with a local resource cache, which is much harder to do with a FS based git.
The FUSE angle is what got me. Our monorepo takes about 90 seconds just to clone in CI, and most jobs only touch two or three packages. Shallow clone helps with history but you basically still pull the entire working tree. Something that could mount the tree and fetch files on demand would cut that to almost nothing for most pipeline steps.
I would say the same for dec128. I would love a standard TYPE for dec128, with maybe zero cost std lib to transform it into a mutable uint128 or a zero cost conversion to struct{uint64,int64).
I haven't tried grok4.2 or grok4.3 yet for coding, but it wasn't up to the challenge as an agent yet. It looks like grok4.3 shifted its training and operates always as an agent first judging on some web usage. Musk knows grok is behind and states it publically. Now with grok4.3 release I do plan to try it again to see if it is suitable.
reply