The sort of hypothetical security vulnerability here is likely to depend on undefined behaviour (buffer over-runs, subverting parsers, etc etc). Just another reason to continue moving over to safe languages, especially for the lower level bits of our stacks. HTTP is big and complicated, I'm much happier exposing Rust/Go/C#/... to it than I am exposing C to it.
In safe languages, backdoors must be far more explicit, so we close off the likely scenario posited here.
"Safe" languages make it harder to write some classes of bugs; this is good. I wonder, though, whether it's not a better return on investment to focus on sandboxing wherever possible? I can run curl in firejail right now with no code changes whatsoever. On supported systems, pledge() and SELinux rules can mitigate attacks with minimal effort. And we get to keep existing programs without investing the man-years to rewrite something that works.
These things are excellent, but lots of stuff gets pulled in via libs. Security is a layer and spectrum, both techniques are valid, valuable and needed.
We need to find a way to extend sandboxing to dynamic libraries in a way that is descriptive at the ops level. Something that can be applied a system management rule so that it can be distributed to end users now and not wait for the safe versions to get disseminated.
Would making sure that every software has libraries packaged/signed and in the same folder as would work? It would minimize common shared lib and updates to the same. THough disk space will be wasted should be still acceptable to many use cases
You are conflating "new and cool language" with a "language that eliminates ton of possible bugs from the get go".
These two terms are far from identical.
You might be masquerading your conservative approach to new languages by hiding behind "C is mature". No it's not. Right now somebody on the planet is introducing a buffer overflow without knowing it, while coding in the "mature C".
Get real already, please. It's high time.
Random example: I dislike Go's error handling but the explicit nature of it has saved me from working half-asleep 50+ times already. Another one: one meager if/else in a supervised Elixir worker saved a server from infinite repeating of a bugged task that would otherwise keep crashing forever. There are others, lots of them. I am sure people can give plenty of examples for Rust as well.
I think you're overreacting. The GP wasn't being conservative about new languages for new projects. S/he was merely warning that rewrites carry their own risks, which might outweigh the benefits of better languages (or for that matter other infrastructure). If you avoid 100 bugs in the new version but add 101 because you didn't completely understand the old code and the environment it runs in, you haven't come out ahead. This phenomenon has been too well known for too long to be blithely ignored.
> I do believe most of Linux userland has to be rewritten though
Why? I'm sympathetic to the argument that 2017 computing shouldn't be on the basis of 1970s UNIX limitations and mindset, but changing that would require a lot more than just rewriting the user land applications, and would require a bigger re-think.
But assuming that the shell's functionality is OK as it is, what's to be gained in a re-write?
For one thing, piping being mostly text-friendly is limiting in many scenarios I stumbled upon. A modern shell should allow arbitrary objects to be piped and processed, much like the functional programming paradigma. The UNIX idea was and still is wonderful, but we're past the text-only thing.
In any case, I feel (and I don't have tens of facts, I admit) we're dragged by the past for far too long. Others have documented their gripes with the current incarnation of Ops / sysadmin problems much better than I could. Here in HN (but I think years ago).
That makes no sense. There has been a ton of research. C, needing to be backwards-compatible, can only implement a subset of that research. More modern languages, not having this burden, are free to implement the entirety of this research. Therefore, C can only be at most equal to new languages, and overwhelmingly likely worse.
Makes a lot of sense. You have 2 scenarios; 1) old code base in C, rewrite in language du jour. 2) new functionality in either C or spiffy lang 2.0 (SPIFFY).
For scenario 1, all that functionality has to be duplicated, including "defects". Are you sure that the functionality is 100% there or did you miss any use cases? Decision for leaving it.
Scenario 2, new widget. Most likely, coding in SPIFFY will be a better choice.
Run old anomalous traffic on old codebase that can handle the corner cases. Run the new traffic over the safe code that covers 80% of the use cases.
Slowly expand the percentage of traffic coverable by the new safe code.
The 80 percent case can be written and coded in a small amount of time. In curls example, it is fetching a file over http 1.1 using an encoding, possibly following some redirects. Then post and put requests, then chunked uploads, downloads, then?
If you had a list of all the corner cases from the start, wouldn't it be easy to just cover them in the new code? I thought the point of corner cases was that they introduce bugs because we don't think of them when we design the system.
Since we're talking about backdoors, how about compiler ones?
With C, there are several routes to bootstrapping your compiler of choice – there are countless implementations that can be used as intermediates (both closed and open source, for all sorts of architectures, with decades worth of binaries and sources available), and diverse double compilation is a thing.
Rust? Unless you want to go back to the original OCaml version and build hundreds of snapshots (and providing you actually trust your OCaml environment), you've got no choice but to put your faith in a blob.
I'm not against Rust as a language, but it seems counterintuitive to use a language that only has one proper implementation and requires a blob to bootstrap, as a defense against backdoors.
You're referring to trusting-trust backdoors, but I suspect that those should be low on the threat model: they seem like they'd be hard to weaponise in way that they're maintained through years of very large changes (in the case of Rust). Just a normal backdoor of a malicious piece of code snuck in seems more likely, and a full bootstrap isn't necessary, nor does it actually help at all, to stop that. (But it's still true that a single implementation is more risky in that respect.)
This is something I've been thinking about quite a bit. It feels like there have to be two kinds of compilers and VMs (if necessary), with different strengths.
One kind of compiler should be like current compilers, with a focus on speed, resource consumption, optimization. Most actual commercial applications would use this compiler, because it provides the fastest and most efficient software.
But beyond that, it might be beneficial to implement compilers with a focus on simplicity and a minimum of dependencies. For example, implement a compiler on an ARM CPU in assembler. The translation step to run this code on an actual CPU is too small and simple to be backdoor'd, and the CPU should be simple or even open.
Such a simplicity oriented compiler could provide a source of truth, if all components are too simple to backdoor'd.
In cryptography, there's the concept of a "nothing up my sleeve number". The idea is that if an arbitrary constant has to be put into an algorithm, the author should describe exactly how they picked the number (it's the digits of pi, etc) in a way that leaves no obvious room for maliciousness. (In some algorithms, a specially-crafted constant can create a backdoor.)
I'm rapidly thinking of safe languages in the same way as "nothing up my sleeve numbers". Code written in a safe language is much easier to verify that there aren't any intentional or unintentional backdoors put in by the author.
Well, if it was an intentional/coerced backdoor, then it doesn't matter how good of a C programmer the author is. Actually, you could argue the better they are, the higher the risk is that they'd be successful in hiding the backdoor.
In safe languages, backdoors must be far more explicit, so we close off the likely scenario posited here.