I see a lot of people saying things like "this is why package signing is important" and "we need to know who the developers are" and "we need to audit everything." Some of that is true to some degree, but let me ask you this: why do we consider it acceptable that code you install through a package manager implicitly gets to do anything to your system that you can do? That seems silly! Surely we can do better than that?
Put simply: in many cases, the dependencies you install don't need nearly as much authority as we give them right now. Maybe some of these packages need network access (I see a few named "logger" which might be shipping logs remotely) but do they need unrestricted filesystem access? Probably not! (They don't necessarily even need unrestricted network access either; what they're communicating with is likely pretty well-known.)
Javas security manager system, is usually not in effect for the majority of use cases.
While maven/grade dependencies, can't run code on installation, generally once the application is ran/tested, it will be with full user permissions, not under a security manager.
The security manager is an additional layer of security that most languages don't have, however Java applets have shown it to be full of holes and generally unsuitable for running untrusted code.
The applet security posture has contributed a great deal towards negative opinion towards the language, probably would have been better off never having existed.
There have been hundreds, maybe thousands of local privilege escalation vulnerabilities on Linux. People still find bugs in basic programs like sudo that have been there for decades. Still, nobody would ever suggest that Linux should have the same security approach as Windows 95 or that it's generally unsuitable for running code you didn't write!
Sandboxing code is hard, regardless of what language, runtime or operating system approach you use.
The Sun JVM, as originally implemented, can express operations that are not valid for Java objects. There are parts of the JVM that attempt to constrain opcode sequences to only be from "valid java compilers operating on java objects".
In 1996, Java was being overwhelmed by exploits because the mapping of the language to the VM was not well matched. There was a Java summit with lots of interesting people. This summit was also when Sun got confirmation that MicroSoft had quite a few engineers working on an independently implemented runtime. To Sun's credit, they did get rather more serious about Java security -- but they had already created a rocky foundation.
It is my opinion, that the business model Sun had "in mind" for Java was a free runtime for everyone that they were in control of, but to make money from selling an "official" Java compiler suite.
I do not believe that the Sun Java JVM was created with security in mind.
I believe that Deno (the "successor" to Node being written by Ryan Dahl) is supposed to fix this for server-side JavaScript/TypeScript. It doesn't grant any permissions to anything unless you specifically give them out (so you can say that only a specific module gets access to the filesystem, for instance, and on top of that it can only access /srv and not /etc).
This looks like it's... getting there, but still too coarse-grained. It looks like those permissions are granted to the whole Deno process? So if your program needed both access to sensitive data on the filesystem and network access, and it used a malicious dependency, that dependency could take advantage of those permissions and exfiltrate that data.
I think it could be something like only the root module could import net, fs, os... then in order for modules to access those things the root module would need to pass it in explicitly. Of course if you don't import the module at all there is no access.
Of course JS isn't a great language for this. A malicious program could spider the object graph looking for something valuable. You would have to be very careful to keep these objects hidden. And a container library would have huge amounts of access with it needs none. (For example if you want to store a hashmap of open sockets)
A stronger typed language like Rust or Haskell could do better, as you container library can be prevented from casting T to File. However even that is not enough as you can just manually cast a pointer if you somehow know what type it is. (And there is a small amount of reflection that can do this even in safe code).
> Of course JS isn't a great language for this. A malicious program could spider the object graph looking for something valuable.
Deno can provide extra syntax or annotations for imports to allow the dev to explicitly allow permission per-import. These can be in the source code, or in a config file.
How would this work exactly? How do you control what module's permission to use for any IO?
For example what if you have a callback library that calls a function that does IO? What if you pass an IO function directly as a callback? (For example File.close) If it is the file where the call is textually written how do you handle dynamic calls? (or are they forbidden).
I think the capability model is probably the right one here.
Wether controls are coarse or fine (all the way to function level, or even line by line), you still need to audit the source code to see if a package is not going to abuse the permissions you grant it. Right?
Not to nearly the same extent; the key is to not grant unnecessary authority in the first place.
Let's say I'm using a `left-pad` function that someone else wrote, and I'm using a system in which modules aren't granted any authority except what you give them. If I then call
left-pad('foo', 5')
...I don't really have to worry that it'll go rummaging around my filesystem for my private keys and exfiltrate them somewhere. After all, I didn't give it access to my filesystem or the network! (Side-channel attacks notwithstanding, things get thorny real quick at that point.)
Now, you still have to worry about the function not doing what being correct - it might return an empty string, it might go into an infinite loop, etc - but you've tremendously reduced the scope of what this module could do if the developer were malicious.
yes while it goes in the right direction in theory, in practice it provides "almost" no additional security.
the only good differentiator right now that could definitely be implemented in nodejs directly is the the flag `--allow-net=<domain>`.
it could prevent data exfiltration but requires the whole stack to require this flag.
I think this is critical. The actual runtime of any code needs to do way more than what it’s doing now.
Simply relying on package signing and the like permits trusted but malicious actors. With Deno packages configured well it can really lock down and limit a ton of attack vectors.
Tech solutions are the best solutions when they work! Fighting with your spouse over who does the dishes? Buy a dishwasher! Don’t want your ISP snooping on traffic? Use https / a VPN!
Unfortunately, package signing does nothing to protect against the threat vector presented here. The authentication system in npm is working fine. The problem is we put too much trust in software from the internet.
...Hence my classification of it as a human problem. I apologize, this is a quirk of my personal vernacular. This is a problem that emergently arises out of the way human beings interact with each other socially, even before tool use comes into the picture.
Alice has a thing.
Bob had a thing that Alice figured would make her life easier so integrates it without looking too hard at it.
Alice didn't reallize that by adding Bob's thing, something Alice wanted private was no longer the case even if her primary use case was solved.
The technical solution is making Alice's thing include a really onerous to configure permissions framework that takes the work of getting a thing set up and increases the task list from program thing to program and configure permissions for thing.
The human solution is to realize you don't know Bob from Adam, or his motivations, and to observe what Bob's thing actually does. Then depending on criticality, remake something similar, or actually take the time to get to know Bob and see if he can make what you want for you under some sort of agreement that facilitates good business and trust all around. You can't be sampling for malicious changes in real-time, so it's all about risk management. The issue in our case, is a lot of these projects are essentially gifts with no active attention paid to them after a certain point. It's a variant of cargo cults. You want this thing? Go here, get that, presto. Businesses, developers, (and their exploiters) like that. The price though is that once a project is abandoned, and the rights transferred to someone you don't know, you have to rerun your risk management calculation again.
The thing people should be worried about is all the PHB's (pointy-haired bosses) who just got ammo for their NMIH (Not-Made-In-House) cannons now that supply chain attacks are becoming increasingly visible vectors for attack.
In a rare self-reply, this feeds into the reason why I scratch my head at the whole license based IP distribution thing.
By bringing licenses into it, you push for a business relationship first, but discourage further toolmaking. Programs are math. Rederivation and application should really be the norm, but can't be if we're drawing boxes around arrangements of symbols and saying "Do not cross."
It's the weird contradiction at the core of what we do as software people that still keeps me scratching my head. We all run to make a hydrant to mark, then try to make rent extracting business around it instead of maximizing the number of variants of hopefully practical and efficient ways to allow everyone else to solve their own problems.
I'm not against people being able to make a living doing what they love, but the incentive structure seems all out of jibe with what I understood to be the overall goal.
Or something. Still wrapping my head around it I guess.
What about reviews and review certificates then?
If you review a the package foo@1.0 you could publicly certify that it is not malicious and maybe earn some money with it.
In turn, you back your claim with a financial security that you pay in case the package actually contains malicious code.
Thats a great idea - but in a centralized system like npm or cargo you don't need certificates to implement that. (Certs might be a nice implementation though.)
So yeah, there might be a "trusted security reviews with payments" shaped technical solution. I'd love to see someone flesh that out - that sounds like a potential solution to this problem (unlike developer-signed packages).
This is so obviously what needs to happen, it's really surprising it's not a feature in all major languages by now. I bet in 10 years time, giving dependencies complete control would seem crazy.
Indeed, being able to apply capabilities on a package level would be great, but I don't know many languages/environments that implement this as a first-class feature.
The WASM ecosystem is exploring this through the use of what they call "nanoprocesses" wherein libraries are wrapped into modules and provided access to nothing by default [1]. This seems to be more of a pattern and consequence of how WASM works than a specific feature.
Yeah. JavaScript is probably the closest to being there (with things like SES[0], LavaMoat[1], etc.) but we're not quite there yet. It's just shocking that this sort of thing is as seemingly obscure as it is; it's like the whole industry has collectively thrown up their hands and said code execution is unavoidably radioactively dangerous. (While simultaneously using package managers that... well.) But it doesn't have to be!
Java does. Of course it’s never been used systematically and it has received precious little attention to DevOps ergonomics, but the infrastructure is there
Safe Haskell is one in this vein (it's lower level and you would apply a capability layer on top), although like other past efforts on this front it's mostly languished in obscurity even among the Haskell community and is used by very few people.
Could you solve this in Java using the SecurityManager stuff that was used to sandbox applets, or is all that considered broken these days? (I'm not sure if you can different SecurityManagers for different parts of the app though.)
That's how the web/application server containers worked (probably still do, but I've been disconnected). The server classes have different permissions from the application code classes (loaded from the .war/etc files). If an application code method calls into a system class, the permissions which apply are those or the application since that method is in the calling stack frame.
I wrote this support into several Java web container and J2EE application server products back in the day. AFAIK, all that still works great today in Java.
I'm not familiar enough with Java to have a strong opinion on this, but this HN comment from the linked article mentions that you can only have one SecurityManager per app, so sounds like that's still too coarse-grained: https://news.ycombinator.com/item?id=18599365
In my experience, the biggest problem with the Java SecurityManager approach is that it's thought of as too difficult to understand / cumbersome to configure (and I'm not saying this belief is wrong), and so most apps either run with no SecurityManager explicitly configured or configure things the "simplest possible way" which usually winds up being approximately equivalent to "anybody can do anything".
Oracle’s own secure coding guidelines for Java [1] actually now recommend adopting a capability-based approach rather than relying on SecurityManager:
> FUNDAMENTALS-5: Minimise the number of permission checks
Java is primarily an object-capability language. SecurityManager checks should be considered a last resort.
(Note: quite a lot of Java’s standard library is not designed along object-capability lines so you should take this advice with a pinch of salt).
They are not in tension. The Java security architecture is a mix of capability and module-level security.
It's probably worth posting a quick refresher. The system is old but people don't use it much these days, and the documentation isn't that good. At one point I wrote a small JavaFX PDF viewer that sandboxed the PDF rendering code, to learn the system. I lost the source code apparently, but the hard part wasn't coding it (only a small bit of code was required), it was learning how to configure and use it. I tested the sandbox by opening a PDF that contained an exploit for an old, patched security bug and by using an old, vulnerable version of the PDFbox library. The sandbox successfully stopped the exploit.
Fortunately the Java team still maintain the sandbox and via new technology like the module system and GraalVM, are reinforcing it. In fact, GraalVM introduces a new sandboxing technology as well that's simpler to use than the SecurityManager, however, it's also probably less appropriate for the case of blocking supply chain attacks.
Java's internal security is based on two key ideas:
1. Code that can protect its private state. When the SecurityManager is enabled and a module is sandboxed, it isn't allowed to use reflection to override field or method visibility.
2. Stack walks.
Let's tackle these backwards. Imagine it's time to do something privileged, like open a file. The module containing the file API will be highly privileged as it must be able to access native code. It will have a method called read() or something like that. Inside that method the code will create a new permission object that represents the permission to open files under a certain path. Then it will use AccessController, like this:
FilePermission perm = new FilePermission("/temp/testFile", "read");
AccessController.checkPermission(perm);
The checkPermission call will then do a stack walk to identify the defining module of every method on the stack. Each module has its own set of granted permissions, the access controller will intersect them to determine what permissions the calling code should have. Note: intersection. That means if any unprivileged code is on the stack at all the access check fails and checkPermission will throw an exception. For example, if an unprivileged module registers a callback from a highly privileged module, that doesn't work: the low privileged module will be on the stack and so the privilege is dropped.
Access control contexts are themselves reified as objects, so instead of doing a permission check immediately you can 'snapshot' the permissions available at a certain point and use it later from somewhere else. And, starting a thread copies the permissions available at that point into the new thread context. So you cannot, in the simple case, elevate privilege.
Stack walking and permission intersection is slow. It was optimised a lot in Java 9 and 10 so the performance impact of enabling sandboxing is much less than it once was, but it's clearly not zero overhead. Therefore the JVM provides other techniques. One is the notion of a capability, known from many other systems. Instead of doing a permission check on every single file read (slow), do it once and then create a File object. The File object allows reading of the underlying native file via its private fields. Whoever has a pointer to the File object can thus read from it. Because pointers cannot be forged in Java, this is secure as long as you don't accidentally lose your pointer or pass it to code that shouldn't have it.
Sometimes you need to wrap a privileged operation to "dilute" it somehow. For example, imagine you have a module that allows arbitrary socket access. You also have an HTTP client. You would like the HTTP client to have network access, but for it to be usable by other modules that should only be able to contact specific hosts. Given what I've described so far that wouldn't work: the highly privileged code that can do native calls would do a stack walk, discover the unprivileged module on the stack and throw an exception. But there's a fix: AccessController.doPrivileged. This is kind of like sudo. It takes a lambda and truncates the stack that's examined for access checks at the point of use. Therefore it allows a module to use its own assigned permissions regardless of who is calling it. Of course, that is powerful and must be used carefully. In this case the HTTP client would itself check a different, HTTP specific permission. If that permission passed, then it would assert its own power to make arbitrary network connections and go ahead and use the lower level API.
There are a few more pieces but they aren't core. One is the class called SecurityManager. This is the most famous part of the API but in fact, it's no longer really needed. SecurityManager simply delegates to AccessController now. Its API is slightly more convenient for the set of built in permissions. For the purposes of understanding the design you can effectively ignore it. The SecurityManager needs to be activated using a system property as otherwise, for performance reasons, permission checks are skipped entirely at the check sites. Beyond that it can be left alone, or alternatively, customised to implement some unusual security policy. Another piece is the policy language. Permissions are not intrinsic properties of a module in the JVM but rather assigned via an external file. The final piece is the module system. This isn't relevant to the sandbox directly, but it makes it easier to write secure code by adding another layer of protection around code to stop it being accessed by stuff that shouldn't have access to it. After a careful review of the old JVM sandbox escapes from the applet days, the Java team concluded that the module system would have blocked around half of them.
So as you can see the design is very flexible. There's really nothing else like it out there, except maybe .NET CAS but I believe they got rid of that.
Unfortunately there are some pieces missing, if we want to re-awaken this kraken.
The first is that modules have no way to advertise what permissions they need to operate. That has to be specified in an external, per-JVM file, and there are no conventions for exposing this, therefore build tools can't show you permissions or integrate the granting of them.
The second is that some code isn't sandbox compatible. The most common reason for this is that it wants to reflectively break into JVM internals, for example to get better performance. Of course that's not allowed inside a sandbox.
A third is that some code isn't secure when sandboxed because it will, for example, create a File object for your entire home directory and then put it into a global public static field i.e. it doesn't treat its capabilities with care. The module system can help with this because it can make global variables less global, but it's still not ideal.
The final piece is some sort of community consensus that sandboxing matters. Bug reports about sandboxing will today mostly be ignored or closed, because developers don't understand how to use it and don't see the benefit. It's fixable with some better tutorials, better APIs, better tooling etc. But first people have to decide that supply chain attacks are a new thing that matters and can't be ignored any longer.
> Sometimes you need to wrap a privileged operation to "dilute" it somehow. For example, imagine you have a module that allows arbitrary socket access. You also have an HTTP client. You would like the HTTP client to have network access, but for it to be usable by other modules that should only be able to contact specific hosts. Given what I've described so far that wouldn't work: the highly privileged code that can do native calls would do a stack walk, discover the unprivileged module on the stack and throw an exception.
Not sure if this is something Java enables, but in principle you could do this in a capability-style way as well. Let's say you have an HTTP client module that you want to allow another module to use, but only to make requests to a specific host. You could write a wrapper with a subset of the HTTP client's functionality, only including (for example) a send() method that would send an HTTP request to the specified host. You'd then pass that to the module that you want to be able to make HTTP connections (rather than the raw HTTP client), and provided your wrapper object doesn't expose functionality from the underlying module that would let a client specify arbitrary hosts, you're in a pretty good spot.
That's the same thing I was just describing but recursed another level. It doesn't help by itself. Something needs to have permission to use the higher level of privilege - raw network access in my example, 'raw' http client access in yours. And something else needs to check that permission. Yes, you could wrap that privilege in a capability afterwards, but the reason Java has both capabilities and stack walking is because something needs to authenticate code and then authorise the production of a capability to start with.
Maybe I'm missing something about the use case, but I'm not sure I quite follow.
Sure, something needs to have permission to use the higher level of privilege. On your typical POSIX OS, your program is probably born with the ability to create arbitrary TCP/UDP sockets by default; on a capability OS, maybe you've explicitly provided it with access to your network stack. Regardless, at the entry point to your program you presumably have modules providing arbitrary network access in scope somehow.
If I'm understanding correctly, the case you described is that you have an HTTP client module that you'd like to have direct access to the network, but you'd like to restrict the consumers of the HTTP client to only querying certain hosts. From the start of your program, you'd instantiate an HTTP client (passing it a capability to use the network interface) then instantiate one of those HTTP client proxy objects that only allows communication with one host (passing it a capability to use the HTTP client). From there, you pass the capability to that proxy object to the unprivileged consumer of the module.
This seems to work without any kind of stack walking authentication logic, just normal variable scope, provided the language is capability-based. Am I missing something?
Exactly. What usually happens in capability systems is that the main() method gets all the capabilities (or whatever capabilities the user allowed it) and then does dependency injection to distribute those to other components. No need for complex stack-based authentication or policy rule evaluation.
Indeed, if you look at the history of Java sandbox escapes they are largely confused deputy attacks: some privileged code source can be tricked into doing something it shouldn’t do.
You can build a sandboxing language without any sort of stack walking. SEL4+C does this. It doesn't have especially good usability at scale, and it's not easy to modularise.
You're imagining a system where there's no specific authentication system for code. Instead in order to use a library, you need to explicitly and manually obtain all the capabilities it needs then pass them in, and in main() you get a kind of god object that can do everything that then needs to be progressively wrapped. If a library needs access to a remote service, you have to open the socket yourself and pass that in, and the library then needs to plumb it through the whole stack manually to the point where it's needed. If the library develops a need for a new permission then the API must change and again, the whole thing has to be manually plumbed through. This is unworkable when you don't control all the code in question and thus can't change your APIs, and as sandboxing is often used for plugins, well, that's a common problem.
There's no obvious way to modularise or abstract away that code. It can't come from the library itself because that's what you're trying to sandbox. So you have to wire up the library to the capabilities yourself. In some cases this would be extremely painful. What if the library in question is actually a networking library like Netty? There could be dozens or hundreds of entry points that eventually want to open a network connection of some sort.
What does this god object look like? It would need to hold basically the entire operating system interface via a single access point. That's not ideal. In particular, loading native code would need to also be a capability, which means any library that optimised by introducing a C version of something would need to change its entire API, potentially in many places. This sort of design pattern would also encourage/force every library to have a similar "demi-god" object approach, to reduce the pain of repeatedly passing in or creating capabilities. Sometimes that would work OK, other times it wouldn't.
The stack walking approach is a bit like SELinux. It allows for a conventional OO class library, without the need for some sort of master or god object, and all the permissions things need can be centralised in one place. Changes to permissions are just one or two extra lines in the security config file rather than a potentially large set of code diffs.
Now all that said, reasonable people can totally disagree about all of this. The JVM has been introducing more capability objects with time. For example the newer MethodHandle reflection object is a capability. FileChannel is a capability (I think!). You could build a pure capability language that runs on the JVM and maybe someone should. Perhaps the usability issues are not as big a deal as they seem. It would require libraries to be wrapped and their APIs changed, including the Java standard library, but the existing functionality could all be reused. The new libraries would just be a thin set of wrappers and forwarders over pre-existing functionality, but there'd be no way for anything except the god object to reach code that'd do a stack walk. Then the security manager can be disabled, and no checks will occur. It'd be a pure object capability approach.
> If a library needs access to a remote service, you have to open the socket yourself and pass that in, and the library then needs to plumb it through the whole stack manually to the point where it's needed.
You don't need to do this. There are a variety of ways to handle this, just as you would any other kind of dependency injection:
1. Design libraries to actually be modular so that dependencies (including capabilities) can be injected just where they are needed.
2. Pass in a factory object that lets the library construct sockets as and when it needs them. You can then enforce any arbitrary checks at the point of creating the socket. (This is much more flexible than a Java policy file).
3. Use a powerbox pattern [1] to allow the user to be directly asked each time the library attempts to open a socket. This is not always good UX, but sometimes it is the right solution.
> If the library develops a need for a new permission then the API must change and again, the whole thing has to be manually plumbed through.
Capturing permission requirements in the API is a good thing! With the stack walking/policy based approach I won't know the library needs a new permission until some library call suddenly fails at runtime.
The policy file isn't required, by the way. That's just a default implementation. My PDF viewer had a hard-coded policy and didn't use the file.
OK, so in a pure capability language how would you implement this: program A depends on dynamically loaded/installed plugin B written by some third party, that in turn depends on library C. One day library C gets a native implementation of some algorithm to speed it up. To load that native library requires a capability, as native code can break the sandbox. However:
1. You can't change the API of C because plugin B depends on it and would break.
2. You can't pass in a "load native library" capability to plugin B because you don't know in advance that B wants to use C, and if you did, B could just grab the capability before it gets passed to C and abuse it. So you need to pass the capability directly from A to C. But now A has to have a direct dependency on C and initialise it even if it's not otherwise being used by A or B.
Stack walking solves both these problems. You can increase the set of permissions required by library C without changing its callers, and you don't have the problem of needing to short-circuit everything and create a screwed up dependency graph.
With the stack walking/policy based approach I won't know the library needs a new permission until some library call suddenly fails at runtime
You often wouldn't need to. What permissions a module has is dependent on its implementation. It's legitimate for a library to be upgraded such that it needs newer permissions but that fact is encapsulated and abstracted away - just like if it needed a newer Java or a newer transitive dependency.
> OK, so in a pure capability language how would you implement this: program A depends on dynamically loaded/installed plugin B written by some third party, that in turn depends on library C. One day library C gets a native implementation of some algorithm to speed it up. To load that native library requires a capability, as native code can break the sandbox.
Now, I'm a little outside my area of expertise due to not having worked with capability systems very much yet. (There aren't that many of them and they're still often obscure, so even just trying to gain experience with them is difficult at this point.)
But that said... in an ideal capability system, isn't the idea that native code could just break the sandbox also wrong? I would imagine that in such a system, depending on another module that's running native code would be just fine, and the capability system's constraints would still apply. Maybe that could be supported by the OS itself on a capability OS; maybe the closest thing we'll get to native code for that on our existing POSIX systems is something like WASI[0].
> You often wouldn't need to. What permissions a module has is dependent on its implementation. It's legitimate for a library to be upgraded such that it needs newer permissions but that fact is encapsulated and abstracted away - just like if it needed a newer Java or a newer transitive dependency.
If our goal is to know that the dependencies we're using don't have more authority than they need, isn't it a problem if a module's permissions may increase without explicit input from the module's user (transitive or otherwise)?
One of the foundations of object-capability security is memory safety, so loading arbitrary native code does subvert that. You can get around this by, for example, requiring native code to be loaded in a separate process. As you say, a capability OS and/or CPU architecture [1] is able to confine native code.
> isn’t it a problem if a module’s permissions may increase without explicit input from the module’s user (transitive or otherwise)?
isn't it a problem if a module's permissions may increase without explicit input from the module's user (transitive or otherwise)?
The modules permissions can't increase without explicit input e.g. changes to the policy file. But the person who cares about the sandbox integrity is the user of the overall software or computing system. The plugin developer doesn't really care how the API is implemented or what permissions it needs. They just want it to work. The person who cares is the person who owns the resource or data an attacker may be trying to breach.
The beauty of object-capability security is that it completely aligns with normal object-oriented design. So you can always recast these discussions to not be about security: how would I inject any other new dependency I needed without changing the API of all intermediaries? And there is a whole literature of design patterns for doing this.
All you'd do there is make the injector a semantic equivalent of the AccessController. The injector must have some sort of security policy after all, to decide whether a component is allowed to request injection of a capability. Whether you structure it as a single subsystem is responsible for intercepting object construction and applying policy based on the home module of what's being constructed, or whether you determine that module via stack walks, the end result is very similar: some central engine decides what components can do and then applies that policy.
The Java approach is nice because it avoids any need for DI. DI is not a widely accepted pattern. There are no DI engines that would have any support for this kind of policy-driven injection. And whilst popular in some areas of software like Java web servers, it hardly features in most other languages and areas, there are no programming languages with built in support for it and that includes modern languages like Kotlin. DI engines meanwhile have changed how they work pretty radically over time - compare something like the original Spring XML DI to Guice to Dagger3. Plus, DI is awkward when the dependency you need isn't a singleton. How would I express for example "I need a capability injected that gives me access to the $HOME/.local/cache/app-name directory"? Annotation based DI struggles with this, but with the AccessController it's natural: the component just requests what it needs, and that's checked against a policy, which can be dynamically loaded from a file, or built by code.
The File example is a good illustration of why Java is _not_ a capability-secure language. Every File object in Java has a getParentFile() method that allows you to navigate up the hierarchy right to the root and then from there access every file on the filesystem. Java’s standard library is full of these kinds of design flaws. So in practice you can only apply capability-based thinking to small subsets of a codebase and have to fallback on the (much weaker) stack walking checks if you want strong isolation.
The problem with Java’s stack walking is that it is too complex and too easy to find privileged code that can be coaxed into performing unintended operations. There are plenty of old write ups of Java sandbox bypass bugs due to this, eg http://benmmurphy.github.io/blog/2015/10/21/zdi-13-075-2013-...
I shouldn't have used File as an example, that was confusing. I was trying to explain capabilities and stack walking in an abstract sense but was also using Java as a concrete example. Big mistake.
You're right that java.io.File isn't a capability. It just represents a file path with a few utility methods to list files, and therefore does a stack walk when you try to access the filesystem. A FileChannel is a file capability in the sense I meant above, because it represents an opened file, not a path. There's an access check once, when it's opened, and then the rest of the time there aren't any stack walks.
It's a pity that Ben Murphy didn't write up all his bugs. There are only two listed there. A few patterns cropped up repeatedly in the old sandbox escapes:
1. Accessing internal code you weren't meant to have access to. Often, some sorts of privileged pseudo-reflection API. Fixing this is the goal of Jigsaw.
2. Serialization acting as a back door, allowing the internal private state of classes to be tampered with. Serialization security has been improved with time and they're now working on a feature that will make it harder to screw this up, by allowing serialization to use normal constructors instead of this ad-hoc form of reflection.
3. Overly general frameworks that allowed attackers to construct nearly arbitrary programs out of privileged objects by chaining them together (this crops up in gadget attacks too). There's probably no platform level fix for this, people just have to be aware of the risks when working in sandboxed context.
I don't think a pure capability language is workable, to be honest. At least not at a large scale. In the purest sense you need a god object passed into the start of your program which vends all possible capabilities, and every library would require the user to construct and pass in all resources it needs externally, including things it might possibly need. And that code isn't easily modularised because you can't just use helpers provided by the library itself: that's the very same code you're trying to sandbox. There's no good way to make that usable or extensible. The combination of stack walking and capabilities lets you find the right balance in terms of API design between simplicity of use and sandbox simplicity.
Are you aware of the history of object-capability programming languages? There are multiple actual demonstrations of real-world ocaps programming languages and projects built with them:
It's actually not at all unworkable to use object-capability for large programs. In fact, one of the main benefits of ocaps is how well it aligns with well-established good software design principles such as dependency injection, avoiding singletons, avoiding global mutable state, and so on.
I know about E and Midori. I haven't looked at the others. As far as I know the only one that could realistically be said to have been used for large programs was Midori but very little about it was ever published, just a few blog posts. And Midori was cancelled. Presumably it wasn't so compelling.
I'd like to see a more modern attempt that wasn't as totally obscure as those other languages. However, nobody is doing that.
That’s just the first example. As the author of that series writes, most of the exploits are not due to memory corruption. Most are confused deputy attacks where privileged code can be tricked into performing dangerous operations.
a security manager can examine the call stack and know which class from which package is asking to perform a privileged action within the app; the class object will tell you which loader loaded it, and you can ask the loader the physical location of where the class comes from if you want to be really sure no one has overloaded it.
Yes, it's a powerful and flexible mechanism. The problem is - how many people know (in detail) how to configure all of this power so as to take best advantage of it? My subjective perception is a that the answer is something like "a very small percentage of Java developers".
I'm reading through all these responses and it sounds like nobody read the article. Everybody keeps bringing up JVM SecurityManager, or how granular Deno's permission system is, or a syntax for granting runtime permissions to modules (like your Agoric link). That's not what happened here. The actual attack in the article was a post-install script run by the package manager. That means whatever kind of limits you might place on runtime capabilities of the library wouldn't have mattered. You need a system that lets the package installer request granular permissions from the package manager, where the package manager runs the scripts in a sandbox and only explicitly-provided privileges are granted. I don't know of any package managers that support this feature today.
This is a really nice idea but considering we haven't even solved the relatively simple case of users giving permissions to apps and expecting them to behave responsibly, I'm not optimistic that we can solve the much more challenging case of importing library code.
e.g., If someone gives an app the ability to upload photos, it can silently read all photo metadata, upload all photos to a private server instead of uploading just the single photo that the user picked. This can be solved with OS level standard photo pickers but it hasn't been yet.
Same with package code. Maybe a package needs network access for stuff it genuinely needs to do. However it can (and probably will) at some point go above and beyond in the amount of data it collects. FB Mobile SDK outage is a good example of this. https://www.bugsnag.com/blog/sdks-should-not-crash-apps
Its unfortunate that the proposed realms is still just a proposal. Even still I've heard many arguments that since the method of isolation lives inside JS it cannot be expected to be entirely secure and you would be much better off relying on OS level security primitives, a point that comments I've read so far completely glosses over. I'd love for someone to prove me wrong that this is air tight so we can champion realms at my work.
POLA is good to live by regardless if it can be implemented.
Not a complete answer (by any means) but keeping tight control over egress network access helps (I wish it was easier to limit egress access over port 443).
Systemd has some capability to restrict access to system resources. I haven't experimented with the capabilities yet so not sure what's all there.
I've noticed more dev teams succumbing to the temptation of easiness that many modern package managers provide (NPM, Cargo, Ivy, etc.) - especially as someone who has to work with offline systems on a regular basis.
Because of that ease there are fewer tools and tutorials out there to support offline package management. There are more for using caches, though these are often along the lines of either 'the package manager will do this for you and it just works (but in case it doesn't, delete node_modules or cargo clean and re-try)', or stand up a dependency server on your own machine with these proxy settings (which has it's own security issues and is frequently disallowed by IT cybersecurity policies).
As an example, many blog articles I found a while back suggest using yumdownloader from the yum-utils package. This is unfortunately not reliable, as there are some packages that get skipped.
I have found I need to script reading a list of dependencies from a file; then for each dependency: create a directory for it, use repotrack to download its RPM and it's transitive dependency RPMs in the dependency's directory; then the script aggregates all the RPMs into one directory, removes the OS installed RPMs, uses createrepo to turn that directory into a RPM repository, and then makes an USF ISO image out of the directory for transfer onto offline system and installation.
I disagree: the problem is not that package managers make things easy, it's just that several of them are poorly designed.
The fact that pip/npm/gem etc. look for packages in a fallback location if not found in the private repository is a terrible design flaw. One which not all package managers have.
For example, when you add a cargo dependency from a private registry, you have to specify the registry that the dependency comes from, so cargo will never go looking in some other place for that crate. I'm sure many other package managers also have designs that are not vulnerable in this way.
Similarly, many package managers do not support pinning of transitive dependencies (with hashes), or pinning does not happen by default, so that many people are still using floating dependencies.
Sudden unplanned loss of availability is a catastrophic security problem, the A in the security CIA[1]. Worse is that the dependency that caused that problem was something that should never have been a dependency in the first place.
Proper dependency management requires a degree of trust and integrity validation which are completely counter to automation. Most developers are eager to accept any resulting consequences because they don't own the consequences and because they are fearful of writing original code.
You can call it laziness but lots of developers correctly assume they'd be out of their jobs or at best out of favor at their company if they raised a fuss about dependency management rather than use (flawed) industry standard tools and get to work on features.
No one gets fired for using npm, you might get fired for insisting you build your own dependency management system because npm is insecure rather than working on your team's domain problem.
Most developers are eager to accept any resulting consequences because they don't own the consequences and because they are fearful of writing original code.
Some developers are fearful of writing original code. Others realize it's not going to be appreciated by their colleagues to write their own package manager to solve a problem most of the industry disregards. Imagine arguing for getting the "write our own package manager to replace npm/yarn/pip" ticket into a sprint.
Currently, managing dependencies correctly by vetting them with each and every version bump is huge amount of overhead and it grows with each dependency pulled in. The way we as an industry have been handling it has largely been to keep going like we don't need to.
It's going to keep getting worse until a) developers and project managers realize doing inherently unsafe things is bad and b) they have the resources to give the additional ongoing levels of scrutiny. I'm not hopeful that this will happen at large in the industry, though I know it _is_ happening within individual companies and projects.
I'm sure we'll mitigate the damage to some extent by making package managers smarter and implementing finer-grained permissions. That will improve the situation over time, but it also takes us in the wrong direction by allowing us to forget that when we're shipping dependencies, we ultimately own their behavior.
It's a practice that's so ingrained and so taken for granted that I suspect it will not change unless a big popular package gets hacked and the vulnerability effects a significant portion of apps written in a popular language like Javascript or Python.
And I'm not really arguing against vetting your dependencies or improving dependency management. I'm just saying in the real world, that if I made this particular imperfection in software development practices my hill to die on at work, there's a 99% chance it is not good for me or my career. So my options are, swim with the tide knowing we're doing things imperfectly, or fight an uphill battle for a more perfect world knowing that unless we avoid some major vulnerability every other Javascript developer falls victim to, there will be many eyes in my office staring over at me wondering if my extra caution is really worth the company's investment. If I keep my job at all.
I want to write great software, but to do that, I need to actually have a job writing software. And until I get a job at Google or Facebook or Amazon (none of those being places I've ever actually applied to) I am generally working in conditions without the resources to do the kind of dependency vetting we're talking about in this thread.
It might be ingrained these days, but this StackOverflow question asking for a package manager for C++ and not really getting any "obvious" answers is just under 10 years old: https://stackoverflow.com/q/7266097/1298153. Conan.io's first commit was in 2015.
You could also treat supply chain attacks on software dependencies like another IT security risk your company is exposed to (just like virus infection, ransomware attacks, phishing, etc) and go through the same thinking (and if appropriate other) processes to manage them. The company can then make a conscious decision on whether it's worth investing in mitigating, eliminating or accepting the risk.
(Apologies if this is all obvious, I'm just trying to highlight an alternative approach which might help you deal with the dilemma and not have to "solve" it all by yourself)
It isn't about writing your own package manager. At least start with not using every package offered by NPM or dependencies that do. If you cannot attest to every package in your dependency tree you have failed dependency management.
Ok so you're going to argue to an engineering manager or product manager that you need a day or days to do a full code audit of each external package you use? Or write your own library instead? That's, if anything, more unrealistic than just writing your own package manager.
Do you actually get to do this wherever you work? Honestly it would be great to have the luxury of that kind of patience and time to invest in my work. But it's universally unrealistic in my experience.
This is not at all a question of "what would be the ideal or perfect scenario." This is a question of what's pragmatic and politically accomplishable in most work environments.
Who said anything about requiring a full code audit? Parent post is suggesting being selective about which packages you consume and which third-party developers you trust, including transitive dependencies pulled in by any package you consume.
I just don't think that's realistic for the JavaScript ecosystem, for the majority of projects. E.g. The weight of something "standard" like create-react-app.
Most devs would, however, end up owning the consequences of fully vetting their dependency tree when their manager gives them a terrible performance review for taking 20 times longer to do everything else than their peers.
This can't change bottom up. Even if you went the professional licensing route you'd need top-down regulation to force companies to only higher vetted and licensed professionals, and to actually do verification of projects to make sure all your best practices were being followed and following up on penalizing developers who weren't.
How is that proof? You're again pointing to a design flaw in NPM: that authors can easily delete packages without warning.
If you yank a package with cargo, it doesn't break people's builds who already depend on that package, it just makes it harder to add new dependencies on the package.
Blaming NPM for that problem is not a cure. Your users don’t care about the sad tired pleadings of a developer about some distant information system in the cloud. All the users care about is that your software was unavailable. That is a security failure.
Again, developers don’t care because they don’t own the consequences of such monumental failures, which is why they will happily and frequently repeat this deliberate mistake until they are terminated.
It's a little bit of both. Maybe "problem" is the wrong word. It's a risk that you need to understand and account for. If you're running a bank, it's an existential impact that you must avoid. If you're running a message board, it's not.
Look at what happened when the "left-pad" function disappeared from npm a few years ago. IIRC, it broke react. The downside of package managers like this is that many people have no idea what they are using.
Coming from the embedded world, where a lot of projects are safety-critical, it always kind of shocks me to see how cavalier others in the software world are about bringing in third party dependencies. Need a bit of code that would take you a day to write? Naaah, just find a third party library that does it (and does god knows what else). And bam! Like that it's part of the build. No code review of the dependency. No security audit. No investigation of what other dependencies that dependency brings in. No investigation into license compatibility. Just slide it into the list of 200(!) other dependencies.
Maybe I'm a dinosaur, but I was taught a long time ago to use a dependency only if there was no other feasible alternative, and to thoroughly know every dependency you bring in as if it were your own code, and treat it as such. Because at the end of the day, you're shipping that code to the customer and stamping your name or your company's name on it. It's your reputation on the line, not the person behind the StudMaster69 GitHub account.
> Need a bit of code that would take you a day to write?
Even if it is a small library that has only a single maintainer the chances of you replicating it in a day seem slim to me unless it is truly trivial, or the library was also written in a day.
More likely you get a day in and realize that the problem has a whole bunch of gotchas that you didn't anticipate and the library maintainer already found and dealt with.
Again, this is only if the problem isn't truly trivial
The left-pad situation wasn't simply that lots of projects were relying on left-pad directly, it was that they were relying on projects that were relying on projects that were relying on left-pad.
Some dependencies are too large to rewrite yourself - most statistical suites would fall under this definition - and while accepting their direct code might be acceptable, it's not usually feasible to fork their code and rewrite the parts that aren't. Lots of smaller parts you could write yourself quickly add up.
Except a lot of projects aren't shipping anything to anyone, they are providing a service. There, you have to assess the effect if the service goes down or is compromised. There is a wide range of significances. If Roll20 (D&D tabletop site) goes down, it may affect their revenue, but no one is going to get hurt. Etc.
Indeed. They make it far to tempting to just pull in a dependency, even if it is not really needed. The worst case of this are one-function packages in npm. And of course whenever you pull in a dependency, that might in a cascade pull in more dependencies. Somteimes the same package is pulled in several times, even in different versions.
What looks elegant as a concept "we just have a graph of dependencies and automatically pull that in" quickly becomes an unmaintainable nightmare and consequently into a huge attack vector.
In the case of RubyGems for some time now it has been throwing a warning if you do not use the `source` block to scope for gems coming from multiple gemservers.
I haven't used private packages, but it astonishes me you don't just add private packages with some kind of flag so it knows to not try to pull a public package.
Anyone who uses this must have already understood and just overlooked this vulnerability when they realize their private package must have a unique name that doesn't match a public package
I have scoped, private packages in `@myscope`. I set up my `npmrc` with `@myscope:registry=url/to/my/private/repo`. I just checked, and if I try to install `@myscope/commmon-library`, when it's not found on our repo, it will fail, because `npm` has associated `@myscope` with one and only one (private) registry.
The only hiccup is that if I'm a new developer, and I haven't made this entry in my user-level `npmrc`, and I'm not using an existing project with a `.npmrc` in the project root, then it will try to hit `@myscope` on the public repo. But if I don't have the registry for `@myscope` configured at all, my actual dependencies won't work, so I should notice that right away.
If nothing else, I suppose the takeaway is to grab the group scope for whatever private scope you choose on the public repo before somebody else squats on it. Still, at least for NPM, this seems like a solved problem, you just have to implement the existing solution correctly.
Does Cargo resolve transitive dependencies with a hash? So for example, if I have a dependency on tokio (which depends on tokio_core), I don't /think/ the meta-data on tokio forces the exact version of tokio_core on a first download/update?
In which case, would you not get the same issue, if you do the same attack, but with a transitive dependency which you haven't specified?
I'm surprised the reverse fully-qualified domain name (FQDN) model used by Java isn't more widely adopted. If you want to upload artifacts to the main repository (Maven Central) you first need to show ownership of a particular domain. For example, via a DNS TXT record (example [1]). Would make these kind of attacks a lot more difficult.
Javas FQDN model is actually pretty bad in practice. Domain names change quite often (I've seen many packages with a dead FQDN), and relying on the TXT record is going to be a security nightmare even worse than the username/password required by npm (since domains expire).
> Javas FQDN model is actually pretty bad in practice
Right, that's why we see this kind of attack all the time on Maven Central, but never on npm... oh, wait?! NO! The kind of simple attacks you see routinely on npm (typo squatting, ownership transfers to malicious authors, now this) just doesn't happen on Maven Central at all.
Normally when people squat they squat like 20+ variations of the name. So that would start to add up to hundred of dollars.
Also, having the domain doesn't make it available on Maven Central. You need to apply to have your domain become a registered groupID on it. This is a manual review process. They validate your domain through TXT verification to make sure the requester to create the group is the domain owner. Then they look to make sure the library is packaged to it. And finally there's a check that the groupID isn't too similar to any existing ones in name, especially to popular ones.
This generally takes 3 to 7 days to get approved.
Once you have a groupID you can release many libraries under it, you don't have to go through that process again.
Now from the user side, things are simpler too, because every lib has a groupID ownership and the lib name. Similar to how on GitHub you have owner/repo.
So it's much easier for me as a user not to confuse org.apache/Log4J with org.malware/Log4J
And like I said, even if someone owned the domain apoche.org they most likely wouldn't get approved to register org.apoche on Maven, because the name is too similar.
It still isn't fool proof admittedly. But it seems much harder to manipulate. And especially if you're a careful user, much easier to trust the source. As long as you got the groupID correct, it's signed and validated. And you can be sure that what you found on apache.org is going to be org.apache on Maven.
Finally, even if the domain changes hands, it doesn't matter. You won't be given access to the Maven repo. Access is given to the Maven user account who registered the group. All you need is to own the domain when you create your groupID. Now if someone transfers their Maven user/pass to a malicious users or become malicious themselves you're still at risk.
Also, I believe there is an appeal, again manually reviewed, like in case you believe your account was stolen, where if you can prove that you own the source repo and/or domain and all they might reinstate you.
But also artifacts are signed, so if your account gets stolen, the thief would need to steal your signature too so it can publish malicious artifacts to the Maven repo.
log5j.online is on sale for $5 / month. What's the expected ongoing cost of a package? If it's $0, then it's literally infinite. At $0.01 / month it's merely 500x more expensive. The real cost is somewhere in-bewteen.
> Right, that's why we see this kind of attack all the time on Maven Central, but never on npm...
Oh yes, the differences is necessarily explained by Maven design being better, and absolutely not because there are two orders of magnitude difference in usage between these two systems…
There are supply chain attacks in Maven Central too[1], but it's not gonna make the front page of HN…
You may notice that in the linked article, only the artifact id has been spoofed. In maven you need to declare both groupId and artifactId for your dependency (and a fixed version, a range is generally considered a bad practice).
To be noted, it makes this kind of attack more difficult, but not impossibile.
Especially the mix public/private artifacts. I guess it will force a lot of companies to at least lock their groupId on maven central, if they never bothered to do so.
That issue isn't even remotely similar, it's just someone uploading new packages and some people choose to use those instead of the official ones, god knows why. It didn't get pulled in automatically for existing projects.
Also, it's cute how you think maven is used orders of magnitude less.
Sure, the build systems probably won't CONSTANTLY be redownloading all the modules like NPM does, instead they keep a cache, but come on.
apples and oranges, the name conflict was perfectly disambiguated by the use of the mandatory group identifier.
npm design was so bad that you could at the beginning upload over an existing version of your package name and break dependencies retroactively even to people that pinned versions.
if you want to try some good old whataboutism, at least try to be in the same ballpark.
Nobody in this thread argues that npm is not bad (it is), the current topic is: “is maven's design[1] better” and there is little evidence on this front.
Maven was (yes, I'm using the past on purpose) not a panacea that later system failed to equal: it has the usability of an IRS form and never gained as much popularity in the Java world than npm in the JavaScript one for that reason. In 2014, last time I did Java for work, the main security feature against supply chain attack was: “we are getting .jar files individually and not using maven because it's a fucking mess”
[1]: not implementation, which is what's make npm arguably a pile of shit
> it has the usability of an IRS form and never gained as much popularity in the Java world than npm in the JavaScript one for that reason.
I thought IRS forms were hard?
I have more than a decade of experience collected from work with CDs (Delphi), downloaded libraries (Delphi, Java, PHP), Ant (Java), Maven (Java), PEAR (PHP), Composer (PHP), Nuget (.Net), NPM/Yarn (Javascript/TypeScript) and Gradle (Java/Kotlin).
Two of these have been somewhat easy to work with for as long as I used them: Maven and Yarn. I hear NPM us usable now, but it absolutely wasn't good early on.
> “we are getting .jar files individually and not using maven because it's a [...] mess”
It seems obvious from your writing that you either worked in a place that was really serious about security or had no clue. Both could result in this conclusion, but based on your writing my bet is on the latter, i.e. they were clueless.
I have concluded that it was a mix of those two: the people you worked for were trying really hard to be really serious about security and failing to automate it.
Sounds like you had some poor experiences with people who didn't know what they were doing.
The proper way to audit your dependencies is to run an in-house Maven repository server. Just like you would for npm, or any package repository really.
So you just spin up Sonatype Nexus, proxy the repositories you trust and disallow releases from being overwritten. That way you're certain the jar you're using today, is the exact same as the one you'll be using years from now.
Alternatively, if you have a vendor who ships their jars manually, you can just vet them, then upload them to your in-house Maven repository and have your developers pull them in as usual.
We do this. I had to work on a greenfield project and it used a ton of libs that weren’t in our repo. It was so annoying to have a list of repos to add to the in-house list, then discover things didn’t work, so now we need these. It literally added weeks of man-hours to the project, per day.
> It literally added weeks of man-hours to the project, per day.
Can you explain what exactly you mean here, because the way I read it realistic time estimates for the projects grew by weeks pr day which probably means
- I misread
- you wrote something you didn't mean
Anyways it sounds like something was way off and I have worked on some projects with Maven and other systems.
Due to the amount of bureaucracy and vetting required, each package needed four people to touch the task, it took about an hour each person (four man-hours). Then you throw in their dependencies and it grew to about 50 man-hours per top-level dependency.
Yes, vetting everything takes a LOT of time. Regardless of the language, package manager, etc.
We (small shop) don't vet the code of all of our dependencies, since we simply don't have the manpower. But we do run nexus to have the guarantee of version pinning. So something that is fine today will be fine in 5 years.
I agree on "not maven fault" but I don't find that much bureaucracy insane, for one changing dependencies on a mature java project doesn't happen that often, and for another knowing licensing, possible patent violation and a scan against a known vulnerabilities database is not a bad thing to do and it's normal for it to take some time as it passes hand between different people, after all you don't want devs working on licensing and you don't want to waste legals just to run a package trough vuln scans software.
beside, companies that care usually also have a database of previously cleared packages, so one can reduce one own work/delays by picking from the approved deps list.
50 hours pr top level dependency not once or twice or ten times but every time does however sound like something "cache-like" is missing on the human level?
true I was reasoning more on the 4 hour per package, which is in line. but you're right for sure the unique dependency graph can't be averaging that high
Maven is/was huge in the Java world. For years it was pretty much the only way to resolve dependencies, until people got fed up with its many idiosyncrasies.
> “we are getting .jar files individually and not using maven because it's a fucking mess”
That seems odd and a bizarre edge case. Nobody worked like that with Java projects, and I bet nobody does today either.
Another example: SBT was supposed to be the "savior" in the Scala world and... I still have nightmares about SBT from a couple of years ago. Maybe it's finally become usable and intelligible in recent years.
I am confusing what now? I'm the one arguing Maven is huge and that the parent post I'm replying to is mistaken. I never mentioned Gradle, that was a sibling comment.
I might have misinterpreted you the first time around. I thought you were saying that people have moved on from Maven, when in fact Maven is still the defacto repository for open source Java projects. Sorry if I did.
Maven is ubiquitous in the Java world and the de-facto package/dependency management system out there. Has been since the mid-2000's and as of 2018 when I last did Java development (Scala really), it is still widely in use. Getting jar files manually would have me running from whatever company that was doing that. Let me guess, they wrote all their code in Notepad because IDE's are a "fucking mess" too right?
You vastly underestimate the level of bureaucracy that can exist in the biggest Java users of this planet (namely banks and public administrations): in these organization (at least a few years ago, the Solarwind attack shows it may not be the case anymore) every single dependency you want to use must be justified, and then is audited by a dedicated team, which ends up handing you the validated .jar.
It was a common development practice in these entities (I was working as a contractor, for different customers), most of them have been using computer programs at there core long before the internet went mainstream.
So you worked with customers using some particularly strict vetting protocols. That's a far cry from claiming Maven never reached the popularity in the Java world that NPM has in the Javascript world -- Maven is the dependency manager in the Java world. The entities you worked for are the exception.
Another thing that strikes me as odd in your comparison: those customers you worked with wouldn't have used javascript+NPM either, since it has all of the problems of Maven and external deps and then some! So what exactly are we comparing then?
That bureaucracy makes sense to me. You're _shipping_ those dependencies. When something misbehaves or allows an exploit it doesn't matter if it happened in a dependency or not -- the dependency isn't the one who's accountable to customers and the regulatory authorities.
It sounds like the processes in use to do this may have been pretty crappy in the organizations you've been with, but it also sounds like it would take less time than implementing a dependency's functionality from scratch in most cases you'd want to pull in a dep.
This "bureacucracy" is very necessary if security is at all a concern. Solarwinds is hot to talk about right now but it has always been the case that having a build download code willy-nilly is a recipe for getting attacked.
In any security conscious organization the only way to pull dependencies is from a local trusted repository. And the only way they get placed there is through a review process.
You have a username/password to Maven Central and you also have a private key to it.
But in order to be granted a groupID (think of it as an account), you need to prove at the time of account creation that you own the domain that matches the groupID (think account name).
So if you try to register com.foo on Maven Central, at that time you need to own foo.com, otherwise you'll be rejected.
If you do own it at that time, well your account is approved and now you have a username/password to it and a private key you need to use to sign artifacts when you publish them.
If your domain expires and is later bought by someone else, that doesn't make them the new owner of your Maven Central groupID.
You only need to validate the domain once using a TXT record. And then you use another authentication mechanism such as a username/password combination.
I believe the TXT record validation is only an additional measure, eg to prevent a random developer from registering/uploading a package like org.apache.http2. Surely other authentication methods are used in practice.
I find it hard to believe any high profile organization would allow their domains to expire, or else they would also lose e-mail and websites, right?
"Years of maturity" or, just thinking about the problem for a bit.
How long did it take npm to have scoped packages. Sure, let me create a "paypal" project, they only need one js project no?
If Java suffers from excessive bureaucracy, the newer package developers/repos suffer from too much eagerness to ship something without thinking
Not to mention dependency and version craziness. If you want your software to be repeatable you need to be specific with the versions and code you're taking.
It drives me crazy that "official" sounding package names like yaml are seemingly given basically first-come first serve, with no oversight. Publish anything you want, but call it Mark's awesome yaml library, or companyName-yaml or something like that so that people are aware that's not an officially supported project
> What would you imagine that oversight looking like, who decides who gets the name `yaml`, and how do they verify it, and who pays for that time?
Just use name spaces. foo.com/yaml instead of yaml. NPM way of doing things is/was just insane, with no regard for trust or security. No wonder NPM corp then went into the business of selling namespaces, AKA selling trust...
Unless it's a part of the standard library included with the language, nobody gets it. There has to be some designation before the name. It's not only node, python also does things like that
I don't understand how a designation in front of the name solves anything. The designation is basically just a name itself, you've just made it a two-part name, and a requirement that all names have two parts. ok, so?
As someone who worked with java for more than a decade before touching the js world, the degree to which npm has been hacked together without any of study prior art is extremely irritating. If you must build something from scratch at least invent some new problems instead of just re-discovering solved ones.
The very existence of package-lock grinds my gears and that's before it starts flip flopping because someone mistook URLs for URIs. Of course that only exists because ranged dependencies are a terrible idea, and that's before anybody even mentions things like namespaces or classifiers.
No maven wasn't perfection, and it could be (and has been) improved on - but npm doesn't even get into spitting distance.
At this point I really wish we'd just go with a proper cryptography model, with a discovery overlay to provide names.
What I want as a developer is to establish my trust relationship to developers of libraries I depend on.
`npm install <somepackage>` should first check a record of signing keys in my source code repo, then check a user-level record of signing keys I've trusted before, and then - and only then - add a tentative trust relationship if this is brand new.
`npm release` or whatever (npm is just an example - every system could benefit from this) - would then actually give me the list of new trust relationships needed, so I can go and do some validation that these are the packages I think they are.
> If only people creating new package managers would bother to spend an hour or two learning prior art.
No, no... We should move fast and break things. We can implement this in a week because the old dinosaurs are too close minded to implement these things.
Today, the hardware is cheap and network is reliable. No need for any safeguards or security features.
Despite being very interested in Rust, this is the first I've heard of crev. It's a very cool project.
That said, it's interesting to me that several people are trying to get the project to drop the Web of Trust, and focus on code reviews. I'm the exact opposite - the code reviews are an interesting, experimental approach, but I'm interested in the project because of the cryptographic Web of Trust. Any use of dev-originated code signing in a package ecosystem is great. For this reason, I'd love for this to get major pickup from Rust, and beyond.
Finally, I am a bit wary because the project is starting to look moribund. It's important for projects like these to know that the maintainer is in it for the long haul, even if there's initially very little adoption. When the project founder writes that they're in a "fight for survival", it makes me think they may abandon the project if it doesn't get significant adoption.
Using a URL isn't what makes Go's dependency management that good. It's just convenience that the import is a URL.
The key thing with Go is that all dependencies have a checksum (go.sum file) and that should be committed to the repo.
So even if the domain gets hijacked and a malicious package is served up, then the checksum will fail and it will refuse to build.
People should be using internal module proxies anyway for Go. You can just store the module files in a directory, a git repo or a web service and serve up an internal cache.
Packages are typically considered immutable once published. If I have a particular package e.g. "FooLib.Bar v 1.2.3" then this zip file should _always_ contain the same bits. If I need to change those bits, e.g. to fix a bug then I need to ship e.g. "FooLib.Bar v 1.2.4"
Also packages aren't always small. So it makes sense to cache a copy locally. On dev machine "package cache" and in an org's "internal feed" and only check upstream if it's not there.
So I shouldn't need to go to the source url to get it. Ideally, I just ask "who has "FooLib.Bar v 1.2.3" for me?"
It also means that tampering can be detected with a hash.
But the "check upstream" model is now vulnerable to fake new versions.
Using a FQDN is less likely to be unknowingly hijacked when it's a domain they control and use daily.
URL references also contain the version number, typically an immutable Git tag reference. They also benefit from just needing to download the source code that's referenced and not the entire package. With Deno you can also optionally host versioned 3rd Party packages on their deno.land/x CDN.
URL references are also cached locally on first use and can be used offline thereafter.
For better or for worse, many projects auto-update their dependencies these days.
They do this to address the shortfalls of modern conventions like small packages, continuous release cycles and dependencies nested several layers deep.
So if you were using the internal package FooLib.Bar v 1.2.3 and an attacker posts FooLib.Bar v 1.2.4 to a global repository, anyone using auto-updating will update to it.
I don't disagree, but both of these (fully or partly automated updates, and attackers) are fairly recent developments to the model.
Of course, mitigation is needed. Supply chain attacks are a hot topic after SolarWinds.
But identifying a package version solely by a url doesn't seem like the right abstraction to me. IMHO, the metadata is more structured: Name (text), version (SemVer) and also maybe now fields to verify and mitigate these attacks: content hash, source feed, etc.
Even if I run an internal feed that transparently proxies and caches the public one, as well as hosting my company's internal artefacts, the rules now might need to be different between packages?
for e.g. between Newtonsoft.Json (new versions always originate on the public feed, never locally) and "SDCo.GeneralUtils" (new versions always originate on the local feed, never upstream)
I’m cackling at how great this is. This is what happens when you trust the internet forever and just scarf down any old thing at build time. Of course it’ll get exploited! That’s what evil people do.
There are a lot of expensive things you can outsource. Responsibility isn't among those.
Free software / open source propels engineering as you can share and leverage the results of collective efforts. However, at no point did the concept come with inherent guarantees about concerns such as security.
esr defined 19 points for "good" open source software development in his seminal essay "The Cathedral and the Bazaar". I feel some of those are sometimes easily thrown out of the window for the sake of "efficiency" or "cost-effectiveness".
This issue resonates with bullet point 17 in particular:
> A security system is only as secure as its secret. Beware of pseudo-secrets.
I think this issue has less to do with package managers, and a lot with companies rushing into the convenience of public code platforms such as Github without properly vetting whether or not they might be inadvertently leaking internal information through packaging manifests.
Offtopic, but I found nowhere to actually ask this question. Does anybody know if ESR is still alive? His blog [1] has not been updated in months--and looking at his post dates, this seems really out of character--, he hasn't posted anything on twitter, or his usual channels.
>>There are a lot of expensive things you can outsource. Responsibility isn't among those.
That is not true at all, the industry both Development and even more so in Operations has been outsourcing responsibility for a long time, they is why we have support contracts, SLA's and other very expensive services we pay many many times more than the cost of hardware for...
To outsource responsibility... Network down -- Call Cisco... Storage Down Call EMC or Nimble... etc
A support contract allows you to hold a sub-contractor accountable. But that's the extent of what an SLA does. What it doesn't do is diminish your responsibility towards anyone who relies on the services you provide yourself. These are distinct things.
Put more succinctly, if the network's down: that's still very much your problem. Especially if you over-promised 100% availability to the end users of your services. Your end users do not care about Cisco, EMC or Nimble. They don't have contracts with any of those. They have a contract with you and they can and will hold you accountable if you don't deliver on what you sold them.
I guess this is where we need to define our anology
for a sysadmin the customer is "the employer" and they do not really have a contract with the sysadmin, rather the employer has contract with Cisco, or Nimble, etc. the sysadmin has "outsourced" his/her responsibility in that context.
For example instead of rolling your own storage device using linux, or freenas or something else, you buy an expensive 3rd party solution with expensive support contracts to outsource their responsibility. If it goes down "I have a support ticket open with vendor" instead of "I am attempting to merge the lastest kernel patch I have downloaded from github"
That is the source of the phrase "No one ever got fired for buying Cisco" or insert name of other large vendor. They do not get fired for it because they have outsourced their responsibility
That's a fair point. And it's a good point. There's a difference in types of contracts and the relationships they represent. An employee/employer relationship is distinct from a customer/vendor relationship.
An employee/employer relationship is defined by a few key properties. As an employee, you sell your time and your expertise to your employer, and you agree to submit to the authority of your employer in exchange for a salary. The extent of your responsibility - and this is absolutely key - is captured in your contract.
It also means that many things simply aren't your responsibility to begin with, even though you deal with them on a day-to-day basis.
As a systems administrator you, quite likely, won't get fired for failing Cisco gear or services because you're not the one who ultimately signs off on the contract with Cisco on behalf of your employer. Responsibility has always resided with the executives who cover and sanction your actions and decisions.
An executive, though, usually won't get fired over failing Cisco gear/services itself, but they will get fired over any ensuing loss of revenue, damage to brand/image, litigation over exposed liabilities,...
A great example of this is President Harry S. Truman who famously had a sign on his desk stating "The buck stops here".
As for the systems administrator, your role is to actively engage in day-to-day operations. You're basically hired "to keep the lights on". Whether the proverbial "light" was procured from Cisco or handcrafted in-house is inconsequential to your employer as far as your individual role as an employee is concerned.
There's more coming.... tons of github integrations ask for blanket access to your account vs Oauth, (https://github.com/marketplace). Tons of github users give that access, the access_tokens are only a password type breach away. If you have these access_tokens you can edit the repos they are for all you want.
I wish GitHub would create a proper auth design. I won’t grant blanket permissions to tokens because there’s too much risk of something going wrong.
It seems dumb that they don’t have per repo tokens. I think the issue is with their licensing as if they made proper tokens users could abuse it by giving tokens to their friends. But this should be detectable in a friendly (please don’t do that) way.
I want to be able to give read-only access to private repos.
I want to be able to give fine grained function level and repo level access.
If I’m an admin on multiple repos, I want to be able to issue a token for just a single repo so I can give that to a CI job without worrying if every single repo I admin is at risk.
They allow ssh keys with some similar functionality, but ssh keys can’t be used as much as tokens.
I’ve been waiting for a story about how some third party app granted access to my whole org gets taken over and wreaks havoc. Eventually this will probably be the attack that alters real packages instead of these name overloading packages.
> It seems dumb that they don’t have per repo tokens.
Technically you can create one new GitHub account per repo and generate a token for that... But that is highly annoying :)
They need to support IAM / RBAC style policies and tie every authn+z method to those policies, but my guess is they have different auth methods strung all throughout their codebase so implementing it will take a few years. Then of course they have to make it "user friendly" as we all know how painful IAM can be...
Comically, that’s why my GitHub recommended. Of course that’s a nightmare for a user to manage, violates our sso requirement, and GitHub charges per user.
At the moment, there's a story about github1s.com on the front page of HN and people are asking how to give it access to their company private repos [1][2]. Scary.
Apparently the Oauth scopes are much worse than GitHubs apps. Only GitHub apps allow read only access to the “metadata” by default whereas Oauth apps get access to the code, deploy keys, etc with no way to limit that access per repo.
Yeah, but at least with PATs (not sure about other token types), you can't scope them to a particular repo, so whenever you need to allow something to even see a private repo or write to a public repo, the token you supply to allow that can do that for all repos and that alone is potentially really destructive. I am not sure if there is a good reason for why PATs can't be scoped to a repository, because if they were allowed to be, it would do a lot for security I think.
We also have multiple orgs, but we hit the requirement that you only have one bot account. It would be super nice if GitHub allowed much tighter scoping for PATs.
I don’t know how many GitHub orgs the Linux Foundation has, but... hundreds? Having one bit account with wide permissions is a non starter
Most integrations just ask for blanket all permissions. They do this because it means they can give you a list of repos and let you choose which ones to integrate their service into with no work on your part except "click yes to give us permission to do everything for you and ... we'll do everything for you"
Honestly, it’s one of the things that makes me nervous about running Linux on all of my computers. At least with Windows (and probably OSX), my updates come from a single vendor who has strict internal code audits and security requirements. With Linux (I’m using Pop), my updates come from a package manager with a crapload of packages, each maintained by a different team / group with no central policy. There’s no way the small team at Pop can review and audit all of the things in the apt package system, and there have to be plenty of maintainers of popular packages who get sweet offers to sell out.
Anyway. I’m sticking with Pop / Linux. But it does make me nervous!
At least with windows, the drivers aren't checked that much and accordingly, have had some serious issues.
I'd guess distros are generally better off in that respect, but kernel space & user space aren't that different nowadays, when caring about your own security
I'm very happy to finally have a real world example to motivate all the folks that eye-rolled me every time I've raised it in the past. It just resonates better, especially with less technical leadership folks.
Not sure if serious, but I will point out this is significantly different. If I'm installing an application like homebrew or the Rust toolchain then I am explicitly giving them the right to code execution. It doesn't much matter whether they get it through the script on their website or the binaries downloaded from that website.
Random libraries, possibly pulled in by a dependency of a dependency of a dependency... not so much.
I'm more amazed by the fact that they got bounties because the attack wouldn't be (easily) possible without insider knowledge on which dependencies their internal build system used
> To test this hypothesis, Birsan began hunting for names of private internal packages that he could find in manifest files on GitHub repositories or in CDNs of prominent companies but did not exist in a public open-source repository.
If I'm not mistaken insider knowledge wasn't necessary.
This post seems like a good time to note that by default, there's no direct way to verify that what you are downloading from dockerhub is the exact same thing that exists on dockerhub [1].
Discovered after seeing a comment on HN about a bill of materials for software, i.e., a list of "approved hashes" to ensure one can audit exactly what software is being installed, which in turn led me to this issue.
I remember when we used to sign binaries and packages and nobody checked the pgp files anyways. We could have something similar better today, just need to be automated enough.
I think image signing support (or at least was) is not as good as it can be. It would be nice if more images were signed by publishers and verification performed by default.
Even then, that only gives you a stronger indication that the image hasn't been altered since it was signed by the image author at any point after it being signed. However it is not a guarantee that the source produced the binary content. It's also not a guarantee that the image author knew what they were signing - though this is a different issue.
Debian has a reproducible builds initiative[1] so people can compile packages themselves and them match byte for byte what Debian built. Not sure how far they've got with that.
Approximately 25,000 of just over 30,000 source packages are now reproducible builds - generating over 80,000 binary packages. See the graphic on the page you linked to:
You can enable client enforcement of Docker Content Trust [1] so that all images pulled via tag must be signed. Whether people are actually signing their images is a different question that I don't know the answer to.
Imagine we navigated the web using a command line tool called “goto” which works exactly like a package manager. If I want to open my bank’s site, I type “goto mybank” .
I could easily find myself in trouble, because:
- There’s no autocomplete or bookmarks, so typos are easy.
- If “mybank” is a name provided by my company’s name server, I could find myself redirected to the public “mybank” entry because Mr. Not-A-Hacker says his name entry is more up to date (or because I forgot to tell ‘goto’ to check the company name server.)
- There’s no “green padlock” to check while I’m actively using the destination site. (Though at this point it’s too late because a few moments after I hit enter the destination site had the same access to my machine & network that I do from my current terminal.)
- A trusted site may later become malicious, which is bad due to the level of unrestricted and unmonitored access to my PC the site can have.
- Using scripting tricks, regular sandboxed browser websites can manipulate my clipboard so I paste something into ‘goto’ that I didn’t realize would be in my clipboard, making me navigate to some malicious site and giving it full access to my machine (if ‘sudo’ as added to the front).
This is just a few cases off the top of my head. If ‘goto’ was a real thing, we’d laugh it into being replaced by something more trustable.
How have current package managers not had these vulnerabilities fixed yet? I don’t understand.
At Google, we have those resources and go to extraordinary lengths to manage the open source packages we use—including keeping a private repo of all open source packages we use internally
But Google is more or less an exception in this regard, from hiring their own offensive penetration testing teams to having a lot of paranoia in general about anything from outside. They had adopted a lot of good practices early on. Even most big companies are not as thorough as them.
I wonder how they built this culture and if it is even realistic for smaller companies to aim for it.
I work on developer infrastructure at Google. Opinions my own.
I think it typically comes down to a few key leaders having the political capital/will to enforce policies like this. Google's `third_party` policies[0] were created relatively early on and were, as far as I understand, supported by high level technical leaders.
The ROI of policies like these is not always immediately evident, so you need the faith of key leaders in order to make room for them. Those leaders don't necessarily need to be high in the org chart — they just need to be respected by folks high in the org chart.
As a counterfactual, establishing Google's strong testing culture seems to have been a mostly bottoms-up affair. Good article on the history of that at Mike Bland's blog[1].
At a previous job I pushed hard for this in a project I was responsible for, despite initial buy-in as time went on there was a consistent level of pushback about relaxing this requirement and allowing just importing anything ( the architecture of this was basically a separate repo storing ALL the dependencies where only a couple of people had commit access and where new dependencies were allowed after vetting )
Fortunately there was a hard legal requirement to vet every dependency license, otherwise I am not sure I would have been able to keep this workflow. As other posts say you do need a very strong commitment at the management level for this to work, besides security (where it feels that often it matters only until it costs money or until it’s even slightly inconvenient) it might be helpful to make a legal case (what if we ship something with a nested dependency on AGPL ) to get some help to establish these procedures.
I have been writing and architecting security related software for pretty much all my career and I find it quite scary how these days so much software delegates so much control to unvetted external dependencies.
We could pay for Google (or somebody else) to do it for us.
We would pay to access their ”distribution”, a limited set of packages vetted by them. Distribution vendor would screen changes from upstream and incorporate into their versions.
Of course this is more limited world. It’s like using a paid Linux distribution with certain amount of software covered by the vendors support policies.
That's more for availability than security. Assuming you keep the crypto checksums / author signatures of all the source code and packages, you don't need to keep a copy of the source / packages. Just verify them at download time. Many Linux distros don't even have a copy of all those binaries, they rely on HTTP mirrors of random organizations.
It's also useful for your organization to rebuild all of the source code from scratch (for reproducible packages anyway) and compare the new ones to the old ones, looking for things like compiler or hardware injection attacks. Secure build systems are definitely non-trivial.
It's not just that. Because it's all in the same repo and built with the same build tool it's also easier to run the same security checks you would use for your own code all automatically as part of your build process. All the tooling you use to secure your own code can be used to secure third party code as well with the same low level of friction.
One more advantage of keeping it together can be easier development cycle. IDE features like autocompletion and building would be faster if artifacts can be cached.
My Vim is indeed magic. I start typing in a name and it autocompletes, then adds includes for whatever the package the thing I just used is in. I also can't imagine going back to not having code search, with its turning every identifier into a link.
> a unique design flaw of the open-source ecosystems
This is a big generalization.
Inside Amazon, as well as in various Linux distributions, you cannot do network traffic at build time and you can only use dependencies from OS packages.
Each library has its own package and the code and licensing is reviewed. The only open source distribution that I know to have similar strict requirements is Debian.
[I'm referring to the internal build system, not Amazon Linux]
[Disclaimer: things might have changed after I left the company]
I think you’re getting downvoted because your point is obscured by the confrontational tone. Argument by authority is especially unconvincing when you aren’t using common terms correctly. In normal usage, “typosquatting” refers to someone registering common misspellings in a shared namespace. As clearly described in the post this is not that but rather exploiting non-obvious differences in the order in which different namespaces are checked.
Using terms correctly is especially important in security: someone who read your comment might incorrectly believe that this did not affect them because they are using the correct names for all of their dependencies.
forgive my naivety, but my understanding of the NPM and rubygems ecosystem is open source packages host their source code on github/gitlab. The source code is super easy to view. Often times, the author will use tags or branches dedicated to specific versions of the code.
For distribution, js and ruby use rubygems and npm to host packages. If a developer wants to verify that the package hosted on npm is the same code being displayed and worked on by contributors on github, they need to pull down both sets of code and then either run a checksum or compare line by line to verify the code matches up. Malware or a nefarious package owner could slip in unexpected code into the package before shipping it to the package host, leaving the github version without the changes. No typo-squatting needed.
Just because some form of the source code is published to Github, doesn't mean its the same code that is hosted on npm or ruby gems.
Yet, reviewing hundreds of thousands SLOCs (across different languages) and also checking legal compliance requires significant skills, time and efforts.
As an individual, you cannot justify reviewing the entire dependency tree across all your projects.
Thankfully you can rely on the packages reviewed and built internally by your colleagues - or use a Linux distribution that does thorough vetting.
This was inevitable from the moment we let build systems and runtime systems fetch things automatically and unsupervised from public repos. This is the simplest and most blatant approach yet, but taking ownership of existing projects and adding malicious code is an ongoing problem. Even deleting a public project can have the effect of a DOS attack.
When I first used maven, I was appalled by how hard it was to prevent it from accessing maven central. And horrified to see karaf trying to resolve jars from maven central at run time. What a horrible set of defaults. This behaviour should be opt-in, disabled by default, not opt-out through hard to discover and harder to verify configuration settings.
Funny that you mention Maven, because Maven is not really vulnerable to this kind of attack simply because it requires a groupId in all dependencies, and to publish under a certain groupId you must prove control of the domain it refers to, which makes this attack nearly impossible (it's only possible if you use an internal groupId which is not controlled by you on Maven Central, AND an attacker could claim that groupId successfully with Sonatype, AND you configure Maven to first look at Maven Central, and only then at your internal repos which would be stupid to do as you normally do the exact opposite - and most enterprise setups won't even proxy to Maven Central at all).
Also, Maven uses pinned versions, normally, and won't just download whatever newer minor version happens to be published when it builds, which again makes this attack quite hard to pull off.
Back then it would have been maven 2 which supported version ranges in a similar way to OSGi manifests. But I really only mentioned maven as the first build tool I used which reached out to public repos uninvited and could break my builds as a consequence of that.
I'm flabbergasted by how silly this is. Bump the version and the package manager chooses yours online vs. the private one. Amazing. How silly and how expensive is this going to be as this blatant security issue is going ripple on for the next months to come.
This is why explicit pins are a good idea. Whenever you finish a project you should set the explicit versions in the lock and then tag it. The problem is with dependencies of your dependencies, but if they are public, then by their nature they won’t be using private packages that can be hijacked.
Even public packages have been hijacked. Pin all your dependencies (transitive included) and then use automation (e.g. dependabot) to update the pinned versions as needed.
I don't think it actually works this way for NPM specifically, if you're using scoped packages correctly. I believe you can associate a scope with one (private) repo and it will not fall back on the public repo, or choose newer / higher-numbered versions on the public repo over a version from the private one.
Pulling packages down at build time seems ludicrous to me, I can understand it in a development environment, but I don't understand how "Pull packages from the public internet and put them into our production codebase" past any kind of robustness scrutiny.
I guess it's a case of the ease of use proving too great, so convenient in fact that we just kind of swept the implications under the rug.
> I can understand it in a development environment
I can't. It's incredibly wasteful time and resource-wise, and ties your development process to third-party providers (and your ISP), which fall over often enough in practice.
It's a good practice to have a local cache of all the third-party dependencies you use, available to both developers and CI infrastructure.
For a distributed company with developers from all over the globe the "local" here doesn't really make much sense. But from my experience with NPM, you download packages on your developer machine once you set up a project, and then only when something really messes up node_modules, which happens once in three months, on average.
You do re-download packages for every build in CI pipeline as you build a docker image from scratch though, and that's when NPM mirror is usually set up.
Well I agree, I don't personally do it either. My stack is comprised of tools that are pretty comprehensive on their own, so they get committed to the repository. A backend framework, a SASS compiler binary and a frontend framework if needed. It all gets put in the repo and any tasks are run by a makefile.
Some things are like that but there is a decent amount of package managers now a days that at least pin package hashes so they'll fail if the package has been tampered with. I'm not aware of many places that audit dependencies to a greater extent than "the license is compatible and it has reasonable maintenance".
Pulling packages from the internet is fine and that's how all Linux distros work but the more important thing is signature verification, imo
This really doesn't help when devs just upgrade everything. Or if they simply install the latest (p0wned) version of something. Pinning hashes really isn't the answer here.
Migrating from public NPM to a privately-hosted, your own mirror of NPM is not a very complicated process, and if you already have a CI pipeline in place, it can be implemented completely transparently to developers. But as many other things that an organisation has to change as it grows from a single-founder startup to a real company, it's something many people just forget to do until they face the consequences.
Mirrors are great for speed and protecting you from dependencies getting randomly deleted off the public repo, but I don't think they can protect from malicious packages. They'll just get pulled into the mirror.
At my last gig (Java), developers reviewed all third-party libraries + dependencies and manually uploaded them to a private Ivy server. I don't think that could work in the Node ecosystem, where every module seems to have 100+ dependencies.
EDIT:
There's a real security vs accessibility trade-off here. You can't be a productive web developer, according to modern standards, and review every single transitive dependency that gets pulled into your application. And it's very inefficient to have individual developers at different orgs separately reviewing the same libraries over and over again.
One would naturally turn to repository administrators to enforce stricter security standards. Maybe RubyGems could review all source code for every new version of a package and build it themselves instead of accepting uploads of pre-built artifacts. But these repositories are run by smallish groups of volunteers, and they don't have the resources to conduct those kinds of reviews. And no open-source developer wants to have to go through an App Store-like review process to upload their silly McWidget library.
There's even a security-versus-security tradeoff. If you manually review every dependency, are you also going to manually review every update to each of those dependencies? If you add friction to your update process, you're also slowing down your ability to incorporate security fixes. Dependencies find vulnerabilities all the time. If you capture a snapshot of "trusted" dependencies, when are you going to update that snapshot, and how long will your project be vulnerable in the meantime?
Right — the only way to really identify malicious packages is a line by line audit... All the way down the dependency chain. On top of that, does a typical org using these packages have developers who can conduct these audits and identify obvious back doors? What about less obvious bug doors?
> You can't be a productive web developer, according to modern standards, and review every single transitive dependency that gets pulled into your application.
Applications I encounter use a ridiculous amount of outside tooling to do relatively simple work, and that's where I see dependencies explode most often.
When I work on package managed, env managed projects, every other day is a new environment issue, configuration issue or version mismatch. It's all a colossal waste of time. Had the project just chosen a handful of comprehensive tools and committed them to the repo, there would be no dark rituals of configuration and package management to perform. The code is in the repo, the code works with itself, life is good.
99% of everyone does exactly this, though. Partly because nobody has any fucking idea what they are doing, and because this is what all the documentation everywhere tells you to do, so that's what the guy who gets tasked with setting up the CI build does...
Well it's mostly held together by trust and (in the commercial case) warranty. That said there's so many potential entry points for malicious actors it's not even funny anymore (esp. in desktop computing)...
I try not to think about it too much and have faith in the powers that be
Every software package and SaaS provider always includes an explicit lack of warranty somewhere in their TOS - and if that means we’re just working off trust, we probably need to move to more ‘trust but verify’ instead of the current ‘fool me once, well just fool me as many times as you want because I won’t use something else anyway and you know it’ model we seem to be using now.
Can you point me to a documented instance where a warranty claim on a software product yielded a useful outcome for the claimant?
The only one I can remember was against Microsoft for forcing an upgrade to Windows 10, which wasn't (IIRC) a warranty claim but a bait-and-switch issue.
That is insane that any company allowed this to happen.
""That said, we consider the root cause of this issue to be a design flaw (rather than a bug) in package managers that can be addressed only through reconfiguration," a Microsoft spokesperson said in the email."
No, npm has scopes for a reason, why would that not fix this issue?
Probably, it's more fun to play with syscall filtering in containers or with fuzzers than to review side-channels or educating coworkers. Therefore, security theater.
Isn't it considered best practice to be secure by default? Wasn't that big fiasco with MongoDB? Why should PyPI, RubyGems, or npm be any different? I'm sure there is some reason but I'd expect them to all pull private repos before public.
Maybe the bug wasn't explained correctly but if it prefers public over private that seems like a bug.
OTOH, it certainly is an issue that if you forget and happen to test some code without being configured to have the private package server as your default then you'd get public repos.
Maybe instead of named packages companies should be using private URLs for packages. That way you always get what you ask for?
npm does not have any 'private package' functionality at all, instead you point it at a different registry server (using eg. Verdaccio or Artifactory) which then serves local packages and proxies public packages if they don't exist locally - or at least that's what they're supposed to do.
Artifactory apparently didn't, and served up whichever was the highest version of public vs. private. Which is stupid.
But the bottom line is that when using npm, the exact package selection policy is determined by whatever registry implementation you're talking to, and so it's the registry implementation which should prioritize private packages by default.
Just "prioritizing" doesn't fix it, you have to limit scoped packages to be provided by a single (trusted, internal) repo. Otherwise, what do you do when internal offers v1.2.3 but external says it has v1.99.99?
This is exactly what Verdaccio does, and has been doing since forever. It frankly kind of boggles my mind that other private registry implementations don't.
It won't be just companies. It'll be developers, sysops, etc who npm install a bazillion of packages, because the core language and libaries are not enough. Those people have keys, credentials and access to the internal networks.
Yep a big part of npm’s problems are actually just flaws with JS. Comparing with Java, it’s insane how many dependencies you have to manage, and that batteries are not included (esp weird when you consider front end apps involve downloading the code!).
The article mentions that RubyGems is vulnerable to this, and that Shopify in particular downloaded and ran a gem named "shopify-cloud", but I'm curious as to how this is possible given a "normal" bundler pure-lockfile setup, or more generally the source-block directives I've seen in most Gemfiles.
How would Bundler ever try and download the `appraisal` gem from RubyGems?
The Gemfile section is more explicable. While newer Gemfiles look like this:
source "http://our.own.gem.repo.com/the/path/to/it" do
gem 'gemfromourrepo'
end
# or
gem 'gemfromourrepo', source: "http://our.own.gem.repo.com/the/path/to/it"
Older Gemfiles apparently looked like the following:
Which seems obviously vulnerable to the dependency confusion issue mentioned.
So is the understanding that Shopify's CI systems were running `bundle upgrade` or another non-lockfile operation? (possibly as a greenkeeper-like cron job?) Or is `--pure-lockfile` itself more subtly vulerable?
Someone will eventually update deps, not necessarily CI. But now that devs machine is compromised. The attacker probably only has a small window of time after it gets in, but it should be long enough to exfiltrate dot-files and the source code of whatever it gets included in. Now they have ssh keys (mine are on a yubikey), and the GitHub url. They can further push malicious code into the repo.
I would hope most SSH keys are password-encrypted if not protected by a hardware token like yours, but I agree that the "unscoped-source" Gemfile syntax is a huge vulnerable hole, and a bad one. I'm just confused about how what seems like a pretty uncommon operation led to such an immediate response and code execution from Shopify.
(I also don't think it's true that the attacker has a "small window of time"—as soon as they get a single RCE, it's over, if they're running on a normal dev machine then they can daemonize into the background, add persistence, and snoop events over time. CI systems are obviously less vulnerable to this by nature.)
It’s a small window because it’s going to take about 20-30 mins for the dev to figure out why tests failed, locate the bogus dependency and shut down their computer, notify secops, revoke keys (if they even think of that), etc.
If you know your computer was compromised. Shut down and reinstall from a backup, you don’t try and clean it.
Edit: I’m assuming the attacker would be replacing a dependency with an empty repo since they don’t know the actual source code. If they know the interface the dependency is supposed to provide, it could spread across the entire organization before anyone noticed.
> I would hope most SSH keys are password-encrypted
TBH, these are probably weak passwords for convenience.
Note that at least on Linux, deploying a keylogger to exfiltrate the password for a SSH key is not hard either. Even if a keylogger is somehow not possible, you can probably still replace key binaries with patched versions or change desktop shortcuts to launch modified programs.
I don't think this is correct—I read the earlier section of the docs ("Block Form Of Source, Git, Path, Group And Platforms") as saying that the block form is equivalent to the "source explicitly attached to the gem", the first priority item in your link
However, this section is concerning:
> The presence of a source block in a Gemfile also makes that source available as a possible global source for any other gems which do not specify explicit sources. Thus, when defining source blocks, it is recommended that you also ensure all other gems in the Gemfile are using explicit sources, either via source blocks or :source directives on individual gems.
Yikes! This is yet another easy footgun for people to reintroduce this issue
This attack demonstrates one of the problems outlined in the Nix thesis[0], that is the problem of nominal dependencies. That is, dependencies of the dependencies, build flags and so on are not taking into account, and in particular, the source of a package.
Nix makes it possible to query the entire build time and runtime dependency graph of a package, and because network access during build time is disabled, such a substitution attack would be harder to pull off.
The declarations for how the source is downloaded is specified declaratively and can be pinned to a specific commit of a specific Git repository, for instance.
> The packages had preinstall scripts that automatically launched a script to exfiltrate identifying information from the machine as soon as the build process pulled the packages in.
Pre and post install scripts in NPM packages are such a terrible idea. Even when it’s not malware, it usually just a nagging donation request with a deliberate “sleep 5” to slow down your build and keep the text displayed.
I'm pretty sure all package managers which produce packages which might bind to C do have "some form" of pre-, post- or build scripts.
The reason is simple because without it you can't properly bind to system libraries.
And even without, the supply chain attack still works against at least developers as packages are not just build but also run, often without any additional sandbox. (E.g. you run tests in the library you build which pulled in a corrupted package).
The main problem here are not build scripts (they still are a problem, just not the main) but that some of the build tools like npm haven't been build with security but convenience as priority and security was just an afterthought. For example npm did (still does?, idk) not validate if the packag freezing file and the project dependencies match so you could try to sneak in bad dependency sources.
Also for things which are classical system package managers (i.e. not build tools) like apt/rpm/pacman it build scripts really does not matter at all. The reason is that what you produce will be placed and run in your system without sand-boxing anyway, so it's a bit different then a build tool which is often used to build binaries (installers, etc.) at one place and then distribute them to many other places.
Edit: Another attack vector is to bring in a corrupted package which then "accesses" the code and data of another package, this could use speculative pointer accesses or similar but in languages like Java,Python, JavaScript you often can use reflections or overriding standard functions to archive this much more reliable.
I don't understand why people keep endlessly complaining about postinstall scripts.
Such 'nagging donation requests' were banned by npm pretty much days after they first appeared, IIRC, and npm itself is literally a tool for installing code to execute later, so there's no security issue here. If someone wanted to embed malware into a package, they wouldn't need postinstall scripts for it.
> Such 'nagging donation requests' were banned by npm pretty much days after they first appeared, IIRC,
What does "banned by npm" mean? Here's an example from the source of the latest version of nodemailer (with 1.4M weekly downloads) sleeping for 4,100 ms on every install so that it can show a "Sponsor us to remove this lag" message: https://github.com/nodemailer/nodemailer/blob/a455716a22d22f...
> and npm itself is literally a tool for installing code to execute later, so there's no security issue here. If someone wanted to embed malware into a package, they wouldn't need postinstall scripts for it.
It's fine to have a standard mechanism for postinstall steps. It should be opt-in by the end user rather than opt-out. That way people know that they're running additional code and ideally selectively pick which packages are allowed to do so. The vast majority of packages do not need it anyway as they do not have C++ bindings or need to generate data.
The defaults for NPM are such that you have to know quite a bit of how NPM works to download a package and inspect the contents without executing random code.
> This is really a complete nothingburger.
It's defensive in depth. With the default being to execute remote code, a single typo could be installing a package that immediately runs malware.
“install” and “execute later” don’t always involve the same permissions. If you apply restrictive sandboxes your code, package managers that aren’t designed to be able to download untrusted code are annoying. Of course, this problem isn’t unique to npm. (It’s actually the opposite – all you need to do with npm is --ignore-scripts, whereas pretty much every other popular package manager I use just makes it impossible.)
And yes, you want to sandbox the install too anyway, but it at least needs permissions enough to do its job, i.e. interact with the network somehow. (Although I’m working on a tool to make that fully deterministic so it can never exfiltrate anything.)
There’s also the possibility that there’s no “execute” step at all, like installing a dependency tree just to inspect source, or in theory being able to skip auditing unused code paths.
The real solution is to design and build software components that can be finished, so they can be ruthlessly vetted - rather than the endless churn of updates.
Gamers often complain how they become free QA testers if they buy a game in the first few months after release as most games are full of bugs (hi Bethesda!) but it is way worse in things like JavaScript libraries etc. It's as if finished have become a foreign word to most developers. Look at the resent story about Linux stable kernels that have had more than 255 minor releases and think how much of a shit show it would have been if they added features too like most developers do. The excellent small stable tools of Unix should have taught us something.
> Look at the resent story about Linux stable kernels that have had more than 255 minor releases and think how much of a shit show it would have been if they added features too like most developers do.
At least some distribution kernels do new feature backports - mostly to support new hardware on LTS versions (like e.g. Ubuntu did for the Raspberry Pi, see https://github.com/raspberrypi/linux/issues/3464).
Not sure why parent is being down-voted as I believe this is an important point. In my opinion this would be applying the unix philosophy of having small tools that does one thing and does it well to code libraries.
Because as long as the underlying hardware and technology overall keeps progressing there isn't much practicality in "finishing" software.
Sure you could just "finish" Linux at 5.0 and then introduce e.g. io_uring via Linux-with-io_uring 1.0 instead of adding it to Linux 5.1. Same goes for all the libraries that add support for io_uring.
Yes, you could "finish" some software on the feature level, but you would still need to maintain it if you want to add support for new platforms, etc., or it will become obsolete sooner or later. In the case of still maintaining libraries, this would solve nothing in the context of this attack vector.
But this is exactly the philosophy in the NPM ecosystem where things like left-pad are rife. And NPM is generally considered a dumpster fire precisely because you need 8,000 deps for relatively "simple" projects like a basic create-react-app project.
I don't understand why there is this issue. We publish our internal npm packages in the @company namespace and we own this namespace on the public npm registry. Problem solved, isn't it?
Except this wasn't a problem with npm but rather with private registry implementations, and a setup with npm + Verdaccio is apparently actually one of the few configurations that isn't vulnerable to this problem.
Not that I didn't expect someone to immediately take the opportunity to complain about npm, of course, despite it having nothing to do with the problem at hand... as has become tradition in tech circles.
> I have been fascinated by the level of trust we put in a simple command like this one
sigh... am I the only one that likes environments where you can run simple commands to install stuff and you can generally trust your package managers? All the security folks love to act dumbfounded when people trust things, but post-trust environments have terrible UX in my experience. I hate 2FA, for example, because now I have to tote my phone around at all times in order to be able to access any of my accounts. If I lose my phone or my phone is stolen while travelling, I'm hosed until I can figure out how to get back in.
> So can this blind trust be exploited by malicious actors?
Yes, it can. Trust can always be exploited by malicious actors, and no amount of software can change that. And it creates a world that sucks over time. Show me a post-trust, highly secure environment that isn't a major PITA to use. And not just for computers. I'm sure you could use social engineering to abuse trust of customer service reps (or just people in general) and do bad things, and the end result will be a world where people are afraid do any favors for other people because of the risk of getting burned by a "malicious actor".
Does this work with AOT compiled languages? Surely the fake packages that get uploaded don't know the structure of the internal libraries enough, so for something like Cargo this would just cause in your build suddenly failing mysteriously & easy to spot. A build.rs could probably do some damage to your build systems temporarily for the 1 or 2 days (if not hours) it takes for engineers to track down what's happening.
What I don’t get from the article is the reasoning behind the design that the central repository “wins” over the local/override repository.
How was that design chosen, not just once but in all 3 of those large package ecosystems. Did pypi/gems/node borrow their design from each other given their similarity in other aspects?
Are there any situations where this behavior is desired?
Does any of the other ecosystems have flaws like this (nuget, cargo..)?
I never understood why these package repositories don't include some (opt-in?) integrity checking option using digital signatures. If I download code that executes on my machine there should be at least the option to establish some level of trust. We have been doing that with linux distro package managers for decades. Seems like common sense to me.
They largely do in various forms. Both npm and yarn, by default, record hashes of the dependencies you're using and check them when redownloading.
I think the issue tends to be more that there's just so many packages (often nested 10+ deep) and it's best practice to keep them as up to date as possible.
When it's fairly typical for a JS project to have thousands of dependencies, there isn't really any practical way to both stay up to date and carefully review everything you pull in.
I think the only viable solution for companies taking this issue really seriously is to keep their numbers of dependencies down and avoid having significant deep/indirect dependencies.
Edit: as an example, in my company's Node stack (for 10 services) - there's >900 dependencies. In our React stack (for 2 sites), more than 1600.
Contrary to what you might think, these are actually pretty small, lightweight systems. So really whatever you might have thought was the worst-case scenario on numbers of deps, the reality is more like 10x that in the modern JS ecosystem.
In many ways, the vast number of tiny dependencies are one of the strongest points of the JS ecosystem. But it doesn't come without caveats.
The package integrity would be fine in this case. The packages downloaded from PyPI would be legitimately signed by PyPI, and the internal packages would be signed by the local package server. The issue is not knowing which source to use for each package, and you'd have the same issue with not knowing which certificate to use to check them.
I was thinking the same thing. Surely PayPal's packages should be signed by a certificate only PayPal has, and they would want to verify that before using their packages?
If the signature reqt is attached to the package metadata, the new package just removed it. If it's part of their custom build system, what signs third party packages? Would it just sign the new one anyway, because how does it know which ones should have PayPal internal signing? Or are you proposing manual controls? shudder
Channels and priorities embedded in the package tools are a better approach, combined with something like Artifactory. Some channels might require packages are signed, and possibly monotonically versioned.
I've found that it is easy for discipline to slide on manual controls. It starts off rigourous and tails off into being done infrequently (which makes it a big job) and perfunctorily. This will save you from things that hit the bleeding edge, the idiots who pull from latest on a prod instance, but leaves you exposed to the patched bugs, with increasingly good exploits.
Diff inspection will catch some obviously bad things, but it will rarely catch anything clever. So it would be down to luck, if you had merged in this patch before it was spotted/announced. Unless you have something to separate the namespaces? Check for conflicts? I guess CI might work, hoping your CI machines are sandboxed.
I teach and one of my students, with little IT experience, asked me last week about the security of package management. I found myself using the many eyeballs argument. It only takes one set of bad eyeballs.
It seems to me that down through the years ease of deployment trumps security. npm, mongodb, redis, k8s.
Or maybe sysadmin has just become outdated? Maybe front of house still needs a grumpy caretaker rather than your friendly devops with a foot in both camps.
We can now even outsource our security to some impersonal third-party so they can 'not' monitor our logs.
Here's the application called deptrust I submitted to the Mozilla Builders program (didn't get in :P) to address this problem space before I had to focus more on my current job. Please let me know if there are any collaborators who would like to work on this together someday!
I know that node has `package-lock.json` and `yarn.lock`, which include integrity checks. Are these checks decorative only? How could npm have been affected by this issue?
IIRC you need to use npm ci to ensure that package-lock.json is used. That said, when developing locally you are going to use npm install or npm update and update the package.json and package-lock.json files accordingly. I could be entirely off target here since I'm writing purely from memory. But there seems to be a few different ways one could trigger a pull from the malicious repo and end up with it inside the package-lock.json file
No that's not correct, in fact this comment and the two sibling comments are both wrong.
Quouting from NPMs documentation[0] for npm install
> This command installs a package, and any packages that it depends on. If the package has a package-lock or shrinkwrap file, the installation of dependencies will be driven by that, with an npm-shrinkwrap.json taking precedence if both files exist. See package-lock.json and npm
Consider an example where in package.json you have `"react": "^16.11"` and this has been resolved and fixed in package-lock.json as 16.12 at a previous point in time. Running npm install will not cause NPM to install 16.14 even though it matches the pattern specified in package.json, instead 16.12 will be installed because that's what package-lock.json says.
What npm install does do, is detect if you've made changes in package.json and only then does it re-resolve the dependency. In the above example, if you changed `"react": "^16.11"` to `"react": "^16.14"` in package.json and then ran npm install the package-lock.json would be changed and version 16.14 would be installed.
I feel stupid for using npm for so long without knowing about this command. In hindsight it's obvious that install was updating the lock file each time, so why should it have been any different on the ci server?
You can easily misconfigure npm by using the `npm i`/`npm install` command on CI/CD instead of `npm ci`. `npm install` does not take any lockfiles into account and only uses the package.json and upgrades any package/dependency that is not pinned.
The final build would probably have failed (on a build server using CI). But when developing locally I think package.json wins over the lock file (? at least often the lockfile is updated after doing an npm install here).
So this probably wouldn't show up on the final build distributed and deployed somewhere. But it did manage to run arbitrary code on developers' machines of those companies.
These install hooks... Why are they needed at all and why can't package (de)installation be without side effects ?
I'm sure the hooks are needed for things NPM can't do by itself, but they shouldn't run by default. That puts pressure on developers to avoid them, and puts pressure on NPM to add whatever functionality is missing from package.json in a safe way.
(and have npmjs.com search rank packages without scripts above those that do)
What would happen if the install hooks weren't there? You'd still have client code calling into the compromised package. Would it be possible to handle those calls without knowing the symbol names used by the internal package?
I have to build some CSS libraries that sadly use npm for building. The way I approach this is through rubber gloves: I create custom docker containers with npm and a specific set of dependencies, frozen in time. This way I can at least get reproducible and reliable builds.
This doesn't mean I'm not vulnerable to dependency attacks, but it at least limits the window, because I update these dependencies very, very rarely.
Apt supports TLS via apt-transport-https (as you are probably already aware) but I don't think it's default in either Debian nor (X)Ubuntu derivatives. I'd like to know why TBH.
The packages themselves are signed though, so I guess the risk is now on server authenticity as opposed to package integrity.
To mitigate this kind of supply chain attacks for python, we have created following tool [1], that will check python packages on Artifactory instance you specify and create packages with the same name on the PyPi.
Yes, if you have packages on the artifactory the `index-url` is always a way to go. However, if you forget to specify `no-index`, you might not get what you wanted, see [1] for how packages are found. And it's easy to make such mistake when using local resources (you forget to set proxy or internal DNS, new developer is not familiar with the setup and does plain `pip install`, internal server is temporarily unreachable).
>It just pollutes PyPi and a nuisance to others.
I agree, but so are the packages that are no longer maintained. You also reserve pakcage name if you decide to opensource it. Furthermore, by creating package you are leaking metadata about your organization, i.e. some functionality can be inferred from package names.
And sure you can train and try to enforce security awareness, but your people need to be right 100% of the time, while attackers need them to make only one mistake. Similar with namesquatting of the popular packages.
The thing that just happened is like a catastrophic chain-reaction collision in space. Now we will have to use guids for everything. Nothing has meaning.
Like some other commenters, I too initially balked at the apparent misuse of "supply chain attack" but the linked paper provides a good definition,
A software supply chain attack is characterized by the injection of malicious code into a software package in order to compromise dependent systems further down the chain.
Backstabber’s Knife Collection: A Review of Open Source Software Supply Chain Attacks
To be clear, just calling this a "supply chain attack" and omitting "software" is going to cause confusion with traditional supply chains.
The analogy is not quite apt: in a software build system you have complete visibility into the dependency tree, so this attack is less useful, whereas with hardware suppliers you are relying on the security of your vendor.
> The analogy is not quite apt: in a software build system you have complete visibility into the dependency tree, so this attack is less useful, whereas with hardware suppliers you are relying on the security of your vendor.
Not necessarily — plenty software still ships with the third party supply chain bits incorporated as binaries, including commercial software. User is relying on security of one or more in a chain of upstream vendors.
This seems to be tending towards the generic problem of permissions that we have seen previously elsewhere.
For example in the case of Facebook, it used to be that users would accept permissions without considering them, and in-turn, various apps would access their data in bad faith.
Likewise for mobile apps.
Eventually Facebook removed many of the overtly powerful permissions entirely, likewise with the mobile operating systems.
In the case of mobile, the concept of "runtime permissions" was also introduced that required explicit approval to be granted at the time of authorization.
On Android, location access now prompts the user in the notification area informing the user of an app that accessed their location.
Can some of these ideas be borrowed to the package/dependency management world? "The package you are about to install requires access to your hard drive including the following folders: x/y/z/all?
This is both a security bug and a reproducibility bug. If anyone outside your network can break your build, your build is broken! It's mission critical to have a working build.
The way Nix handles this is that every external resource is cached and hashed, and every reference to an external resource must have a hash integrity check. If someone swaps out a package on a web server somewhere, rebuilds keep working because they don't need to re-fetch (because the hash wasn't changed by an operator), and fresh builds fail with an error indicating the hash is invalid, which should trigger an investigation (in practice, this is exceedingly rare, and IMO always deserves attention).
I dream for when build reproducibility is considered table stakes like version control.
I think JFrog and Azure won the prize for product placement on this one. When the article listed “Azure Artifactory” I wondered if Azure was “sherlocking” JFrog, but no, they have a partnership. Given the SolarWinds vector I expect more investment in tooling security.
PGP signing of packages should be table stakes for publishing to a public repository. If unsigned packages are accepted by a public repository to reduce friction for newbies, such packages should be hidden by default.
Then, build tools should be configurable such that they only pull in dependencies signed by PGP keys drawn from a whitelist.
Finally, companies need to maintain private repositories of vetted dependencies and avoid pulling from public repositories by default — and this requirement needs to be configurable from the project's build spec and captured in version control.
If you've seen the PGP/GPG code you'll know what a trash fire it is, and if you follow its development you'll see how unfriendly the maintainers are when bugs are pointed out.
Adding dependencies on PGP just makes everything worse.
X.509 PKI for code signing is also terrible and very very complicated and error prone.
Also consider the community nature of development. You need to handle all sorts of painful crypto issues now.
For npm enterprise, it looks like setting the scope (e.g. @acmecorp/internal-pkg) would mitigate the public and private confusion. For Verdaccio, an open source light weight npm registry, it first checks if a private package is available before searching the public npm registry (however, their best practices say to use a prefix for private packages https://verdaccio.org/docs/en/best )
I don't use npm much, but once I'm out of the initial development phase with any package manager and am "feature complete" we generally lock versions down so at least we're always pulling a specific version in.
And, of course, on production build machines, all packages are local.
This isn't just for "security" -- it's to ensure we can always build the same bits we shipped, and to avoid any surprises when something has a legitimate update that breaks something else.
My favorite supply chain attack is still the chip vendors. Even if you come up with a hardware security module in your chip to verify the code that's running on it, that can be (and has been) hacked too. Sleeping dragons could be lying in wait in billions of devices and nobody would know unless they went out of their way to do a low-level analysis.
I've been wishing npm/pypi/apt etc would improve for ages, but it seems like infrastructure improves one disaster at a time, software one hack at a time. I'm only annoyed I didn't do it myself.
The pypi maintainer is being ridiculous, it is much better to have this guy poke MSFT than have the Russians do it, he's doing them a favour.
The only really shocking part of this is that Artifactory is vulnerable to this. I expect developers to be lazy about build security because I've seen it over and over again at multiple companies, but Artifactory's whole purpose is to provide secure build dependency management.
I'll be rethinking using Artifactory in my infrastructure.
Diffend allows you to manage the risks that come with using open-source third party dependencies by providing malware detecting security scanning and a risk management platform for your Ruby dependencies.
This brings a whole new level of awareness to package files where a simple typo can mean your machine can be rooted. From now on I'll always be terrified whenever changing any of my package.json, Gemfile or requirements.txt files.
Why didn't npmjs/rubygems just check failed lookup requests for "shopify-cloud" etc and block those for a while to prevent damage, and notify the companies (doing their best)? Seems like low hanging solution.
It surprises me a bit the way they refer to in-house dependencies purely by version number. When we have internal dependencies in e.g. package.json, it's always referred to by an explicit url and git ref.
> After spending an hour on taking down these packages, Ingram stressed that uploading illicit packages on PyPI puts an undue burden on the volunteers who maintain PyPI.
I'll try to answer this from a JS-specific perspective. As someone previously mentioned - you do get hash checks if you're using `npm ci` in your CI/CD setup. You get the resolution path as well. Which is all you need to reproducibly resolve dependencies, *if* you have set up npm correctly in your pipeline. It would be unlikely to be exposed to this particular attack, at least not automatically in your deployment pipelines.
However this is still very, very dangerous, because of day-to-day engineering, really. Any engineer doing a simple `npm install` can inadvertently bring in and execute malicious code from their machine. From there on out it would be somewhat trivial to gain further access to the same network the code war run from.
this shouldn't be a problem with golang right? because it uses an id when go mod is used. I'm rusty on go since I haven't used it in over 2 years but I believe this shouldn't affect it?
luckily I reserved my company's namespace in packagist few months ago. each package manager works differently and it is hard to know inner workings of all package managers
Helps in this specific case, but will not eliminate the broader issue. The broader issue is how can you trust 3rd party code to not do anything harmful, and it’s not like we can even perfectly trust our own fingers in that regard.
But the broad issue will never be resolved. The entire foundation of society is built around some trust. What has proven to work is that we trust actors who are identifiable and have something to lose. This goes with everything from your bank, to government, to name brands, to your local restaurant.
Nix package managers/repositories have a level of scrutiny to get into, and highly dedicated people in charge of. Random github repos (or npm packages) are extremely low effort/risk to set up.
Of course the former can be abused, but the incentives are at least in its favor to likely be more trustworthy. And we have to make assumptions of trust everytime we sit down in our chair or turn on our computer, or plug in a space heater. We will never get around trust, but there are differences in levels of trust and trustworthiness.
>Nix package managers/repositories have a level of scrutiny to get into, and highly dedicated people in charge of. Random github repos (or npm packages) are extremely low effort/risk to set up.
That's not really true though. Nix doesn't support signed sources, there are no signatures in the package repository and in theory "John Doe" with no information can add packages and send pull-requests.
In practise nixpks is just a well moderated user repository and the level of scrutiny is less then the enterprise distros can offer.
How does it work in practice, though? For example, create-react-app in NPM has a bajillion deps. Do I trust 8,000 keys? Which ones are OK?
I get that you could in principle namespace things (at least for package managers that support this) and insist on a small set of company-internal signing keys for those namespaces. But managing all that isn't easy and what about for package ecosystems that don't really have namespaces (e.g. PyPI, NuGet)?
As you see with chrome extensions and android barcode apps, this is not a solution. Developers change or for whatever reasons can change their mind and ship bad things.
We need a blockchain for source. It is obvious and we just haven't come to terms with it yet. Then anyone can run anything provided they have the right key.
This article from Agoric is extremely relevant here, from a previous such incident (re: the event-stream package on npm): https://medium.com/agoric/pola-would-have-prevented-the-even...
Put simply: in many cases, the dependencies you install don't need nearly as much authority as we give them right now. Maybe some of these packages need network access (I see a few named "logger" which might be shipping logs remotely) but do they need unrestricted filesystem access? Probably not! (They don't necessarily even need unrestricted network access either; what they're communicating with is likely pretty well-known.)