Company wanting to hire essentially has two options: (1) hire from the pool of fresh candidates coming out of the Universities, (2) hire people who are already employed.
This means that to inflate the numbers of software engineers on the market you also have only two options: (1) have the Universities start to somehow exponentially produce the number of software engineers which the market could not amortize, (2) let go a substantial number of software engineers who now (in between 2020-2025) all of the sudden cannot find a new job anymore
(1) is a non-sense and for (2) to take place market needs to stagnate, which is what is happening. Reasons are manyfold.
It makes sense when you find out that there's only so much humans who want to do and are capable of doing that type of work. Also, robots are cheaper, and they are the opposite of humans - theyt are not emotional, lazy, feeling fatigue or any other spectrum of feelings or behavior ...
They all look the same, e.g. like sh*t IMO, so although the above statement sounds overgeneralized it doesn't meant that it doesn't generalize well. IME it does.
That looks interesting but it seems inefficient to put an LLM directly into the compilation pipeline, not to mention that it introduces nondeterministic behavior.
It has different limitations but inefficiency doesn't seem likely to be one of them. Did you read the Experimental Results section?
> Figure 2 shows the experimental results, and GenDB outperforms all baselines on every query in both benchmarks. On TPC-H, GenDB achieves a total execution time of 214 ms across five representative queries.
> This result is 2.8× faster than DuckDB (594 ms) and Umbra (590 ms), which are the two fastest baselines, and 11.2× faster than ClickHouse.
> On SEC-EDGAR, GenDB achieves 328 ms, which is 5.0× faster than DuckDB and 3.9× faster than Umbra.
> The performance gap increases with query complexity. For example, on TPC-H Q9, which is a five-way join with a LIKE filter, GenDB completes in 38 ms, which is 6.1× faster than DuckDB. GenDB uses iterative optimization with early stopping criteria.
> On TPC-H, Q6 reaches a near-optimal time of 17 ms at iteration 0 with zone-map pruning and a branchless scan, and does not require further optimization. In contrast, Q18 starts at 12,147 ms and decreases to 74 ms by iteration 1, which is a 163× improvement. This gain comes from replacing a cache-thrashing hash aggregation with an index-aware sequential scan.
> On SEC-EDGAR, Q4 decreases from 1,410 ms to 106 ms over three iterations, which is a 13.3× improvement, and Q6 decreases from 1,121 ms to 88 ms over four iterations, which is a 12.7× improvement. In Q6, the optimizer gradually fuses scan, compact, and merge operations into a single OpenMP parallel region, which removes three thread-spawn overheads. By iteration 1, GenDB already outperforms all baselines
That's all great, but sadly impractical.
I looked at one of the first statements:
> GenDB is an LLM-powered agentic system that decomposes the
complex end-to-end query processing and optimization task into
a sequence of smaller and well-defined steps, where each step is
handled by a dedicated LLM agent.
And knowing typical LLM latency, it's outside of the realm of OLTP and probably even OLAP. You can't wait tens of seconds to minutes until LLM generates you some optimal code that you then compile and execute.
Considering it's just s single Phd student who does this work, I don't believe such a task can be realistically accomplished, even as a PoC / research.
Why not? Even without LLMs it is technically feasible to build custom database engine that performs much better than general database kernels. And we see this happening all the time, with timeseries, BLOBs, documents, OLTP, OLAP, logging etc.
The catch is obviously that the development is way too expensive and that it takes a lot of technical capability which isn't really all that common. The novelty which this paper presents is that these two barriers might have come to an end - we can use LLMs and agents to build custom database engines for ourselves™ and our™ specific workloads, very quickly and for a tiny fraction of development price.
If you look into the results, you will see that they are able to execute 5x TPC-H queries in ~200ms (total). The dataset is not large it is rather small (10GB) but nonetheless, you wouldn't be able to run 5 queries in such a small amount of time if you had to analyze the workload, generate the code, build indices, start the agents/engine and retrieve the results. I didn't read the whole paper but this is why I think your understanding is wrong.
If they count only query execution time, not everything else, it would make sense though. It also could be practical, if your system runs just a few predefined and very optimized queries.
> Because this is what "mimics" index scan (without prefetch) on cold data. More or less.
This is an interesting observation but does it really mimic the index scan? This would be essentially a worst case scenario. Submitting IO requests one by one would be a very inefficient way to handle scans, no?
True. Unfortunately it's what index scans in Postgres do right now - it's the last "major" scan type not supporting some sort of prefetch (posix_fadvise or AIO). We're working on it, hopefully it'll get into PG19.
So basically you get hired with 10-15 years of experience and you start nothing but by earning trust fixing small problems for how long? That sounds like a great way to get into the "does not meet expectations" territory very quickly.
Anthropic has been doing these things independent of what the US admin has publicly asked for, even before Hegseth started breathing down their neck. They were already taking DoD contracts and like, just like the rest of them. Hegseth, with the skill all schoolyard bullies have, simply smells their weakness and is going for the jugular now.
They also have never had any guarantees they wouldn't f*ck around with non-US citizens, for surveillance and "security", because like most US tech companies they consider us to be second/lower class human beings of no relevance, even when we pay them money.
At least Google, in its early days, attempted a modest and naive "internationalism" and tried to keep their hands clean (in the early days) of US foreign policy things... inheriting a kind of naive 1990s techno-libertarian ethos (which they threw away during the time I worked there, anyways). I mean, they only kinda did, but whatever.
Anthropic has been high on its own supply since its founding, just like OpenAI. And just as hypocritical.
Ah, I see. Do I understand correctly that this means that for a given instance of polymorphic object I can switch between static polymorphism and dynamic dispatch, and use them both simultaneously? How is this useful in practical terms, like why would I want to do it?
Sort of. Given an instance (can even be a primitive) you can obtain a dyn reference to a trait it implements simply by casting it.
let a: i32 = 12
let b = &a as &dyn std::string::ToString; // i32 implements the ToString trait
let c = a.to_string(); // Static dispatch
let d = b.to_string(); // Dynamic dispatch through dyn reference
Note that there's not really any polymorphic objects in rust. All polymorphism in this case goes through the dyn reference which contains a pointer to a vtable for a specific trait.
Additionally, going from a dyn reference to a type-specific reference is not easy. Also, certain methods and traits are not dyn-compatible, mostly due to generic parameters.
The main use comes in with various libraries. Doing dynamic dispatch on a specific type is not very useful, but your library might expose a trait which you then call some methods on. If you accept a generic parameter (eg. impl Trait) each such invocation will cause monomorphization (the function body is compiled separately for each generic type combination). This can obviously bloat compile times.
Using a dyn reference in your API will result in only a single version being compiled. The downside is the inability to inline or optimize based on the type.
One additional use I found is that you can sometimes get around the divergent expression type in match expressions. Say you need to print out some values of different types:
let value: &dyn Display = match foo {
A(numeric_id) => &numeric_id,
B(string_name) => &string_name,
C => &"static str",
};
This would not work without dyn as each value has a different type.
Ah, I see. Thanks for the example. I think I understand now. In C++ problem of monomorphization, or potential bloat due to excessive template instantiations, is normally handled at the linker level but can be controlled at the code level too through either by rewriting the code with some other type-erasure technique or simply by extracting bits of not-quite-generic-code into smaller non-generic entities (usually a non-templated base class).
Does this mean that the Rust frontend is spitting out the intermediate representation such that it doesn't allow the de-duplication at the linking phase? I see that Rust now has its own linker as of Sep 25' but it still works with normal linkers that are used in C and C++ too - gnu ld, lld, mold, ...
There's no custom rust linker just yet. The change in September was to switch from GNU LD to lld for performance on Linux. There are some rust linker projects (like wild), but these tend to be aimed at speed (and/by incremental linking) rather than executable size.
I'm not sure how useful deduplication at the linker level is in practice. Though I don't think Rust does anything different here than C++. The main issue I imagine is that the types used in generic code have different sizes and layouts. This seems to me like it would prevent deduplication for most functions.
I think the question is, do you know at compile time what the concrete type is? In situations where you do, use static. (I'm not sure I'd call that "polymorphism". If you know the static type it's just a function on a type, and who cares that other types have functions with the same name?) But if you don't know the concrete type at compile time, then you must use dynamic dispatch.
And you can use each approach with the same type at different points in the code - even for the same function. It just depends on you local knowledge of the concrete type.
That's polymorhpism 101, and not quite what I was asking. From my understanding what Rust has is something different to what C++ is offering. In C++ you either opt-in for static- or dynamic-dispatch. In Rust it seems that you can mix both for the same type object and convert between the two during runtime. It seems that this is true according to the example from dminik from the comment above but the actual purpose is still not quite evident to me. It seems that it tries to solve the problem of excessive template instantiations, called as I can see monomorphizations in Rust. In C++ this is normally and mostly done through the linker optimizations which may suggest that Rust doesn't have them yet implemented or that there are more useful cases for it.
> It seems that it tries to solve the problem of excessive template instantiations
No, I don't think the way Rust implements dynamic dispatch has much, if anything, to do with trying to avoid code bloat. It's just a different way to implement dynamic dispatch with its own set of tradeoffs.
On one such engineer there's hundreds or thousands of others executing the idea into work so your premise about "Code is a very small part of the overall picture" is obviously very wrong. You wouldn't need to hire that many people if that was remotely true.
Different type of "coding skills" and different type of complexities make these two impossible to put into the same bucket of "still easy". You've probably never done the latter so you're under impression that it is easy. I assure you it is not. Grasping the concept of a new framework vs doing algorithmic and state of the art improvements are two totally different and incomparable things. In 30M population of software engineers around the globe there is only handful of those doing the latter, and there's a reason for it - it's much much more complicated.
You are conflating problem solving and the ability to write code. Web Dev has its own challenge, especially at scale. There’s not a lot of people writing web servers, designing distributed protocols and resolving sandboxing issues either.
I'm not conflating one with each other, I am saying that "coding skill" when dealing with difficult topics at hand is not just a "coding skill" anymore. It's part of the problem.
Not knowing C after a course on operating systems will block you from working on FreeBSD. Knowing C without a grasp on operating systems will prevent you from understanding the problem the code is solving.
Both are needed to do practical work, but they are orthogonal.
Exactly but they are not orthogonal as you try to make them to be. That's just trivializing things too much, ignoring all the nuance. You sound like my uncle who has spent a career in the IT but never really touched the programming but he nevertheless has a strong opinion how easy and trivial the programming really is, and how this was not super interesting to him because this is work done by some other unimportant folks. In reality you know, he just cannot admit that he was not committed enough, or shall I say likely not capable enough, to end up in that domain, and instead he ended up writing test specifications or whatnot. A classic example of Dunning-Kruger effect.
There is a nuance in what you say. You say it is "still easy" but it is not. It is not enough to take a course on operating systems and learn C to start contributing to the operating system kernel in impactful way. Apart from other software "courses" that you need to take such as algorithms, advanced data structures, concurrency, lock-free algorithms, probably compilers etc. the one which is really significant and is not purely software domain is the understanding of the hardware. And this is a big one.
You cannot write efficient algorithms if you don't know the intricacies of the hardware, and if you don't know how to make the best out of your compiler. This cannot be taught out of the context as you suggest so in reality all of these skills are actually interwhined and not quite orthogonal to each other.
I do agree with you that there's a skill tree for any practical work to be done. And nodes can be simple or hard. But even if there are dependencies between them, the nodes are clearly separated from each other and some are shared between some skill sets.
If you take the skill tree you need to be a kernel contributor, it does not take much to jump over to database systems development, or writing GUI. You may argue that the barrier entry for web dev is lower, but that's because of all the foundational work that has been done to add guardrails. In kernel work, they are too expensive so there's no hand holding there. But in webdev, often enough, you'll have to go past the secure boundary of those guardrails and the same skill node like advanced data structures and concurrency will be helpful there.
Kernel dev is not some mythical land full of dragons. A lot of the required knowledge can be learned while working in another domain (or if you're curious enough).
No, it's not mythical but it is vastly more difficult and more complex than the majority of other software engineering roles. Entry barrier being lower elsewhere is not something I would argue at all. It's a common sense. Unless you're completely delusional. While there's a lot of skills you can translate from system programming domain elsewhere there are not a lot of skills you can translate vice-versa.
Company wanting to hire essentially has two options: (1) hire from the pool of fresh candidates coming out of the Universities, (2) hire people who are already employed.
This means that to inflate the numbers of software engineers on the market you also have only two options: (1) have the Universities start to somehow exponentially produce the number of software engineers which the market could not amortize, (2) let go a substantial number of software engineers who now (in between 2020-2025) all of the sudden cannot find a new job anymore
(1) is a non-sense and for (2) to take place market needs to stagnate, which is what is happening. Reasons are manyfold.
reply