The important thing that probably isn't obvious is how few samples you actually ...

The important thing that probably isn't obvious is how few samples you actually need. Jitter is essentially random, which means it is decorrelated from the repeated pattern (1 compare takes longer, 255 take the same time)

With the added noise (jitter), all you need to do is distinguish the two normal distributions. This takes less samples than you might guess at first. Think of it as the same problem as distinguishing a biased coin (51% heads) from an indentical-looking unbiased one, merely by flipping each. Given the known bias percent and a desired confidence level (say 95%), calculate how many flips you will need to make a decision.