I'm not familiar with how the video is encoded, but maybe you have to be careful to make sure that the encoding doesn't introduce correlations between adjacent frames (you can imagine how P or B frames might have correlated noise, for example).
Also, I don't think you'll ever see Cosmic Ray interactions last more than one frame... the time scales for cosmic ray interactions are in nanoseconds, not ms. I suppose electrons could get trapped somewhere and diffuse slowly (that does happen in some CCD's, I don't know enough about CMOS sensors to know if that can happen), but that would happen with bright images in addition to cosmic rays.
In such a small sensor, cosmic rays should be relatively rare. To first order, you can model the number of cosmic rays you see as a Poisson process, which should be a pretty good source of entropy. You can even estimate the rate based on altitude / sensor orientation, etc. However, nearby cosmic ray detectors will detect correlated cosmic rays (from extensive air showers from the same primaries), which might hurt.
The key property for crypto randomness here is that these high energy particle events (be that cosmic rays, background radiation, etc) are not just random, but independent from thermal noise. They are few and far between but they affect each sample somewhere. One way or another all that entropy will get hashed, and having even few bits that are contributed by independent phenomena makes final hash extremely hard to attack.
Considering all sources that contribute noise to sensors (thermal, light photons count, high energy particles, shot/RTS noise, and i'm probably missing a few), all with unique distributions and characteristics makes each sample readout very hard to predict.
Tangentially, I wonder if most modern smartphone chips include a hardware random number generator, and if that is exposed to userspace?
The iPhone has hardware random number generator, at least: "The Secure Enclave is a coprocessor fabricated in the Apple S2, Apple A7, and later A-series processors. It uses encrypted memory and includes a hardware random number generator."
I checked on that a while back as well. As far as i can find out SE HRNG is not exposed to user at all. It used internally in quite complicated process of secure booting and unlocking iOS device (there is an interesting presentation floating around with all details of reverse engineering of that process, and amount of security designed by Apple into their own hardware-to-hardware protocols is on very respectable level of insane). I think its likely SE HRNG is included in seeding /dev/urandom on iOS, so it is one of the most secure CSPRNGs around.
worth mentioning (that sort of main premise of the article that gets a little bit unnoticed in all the methodology discussion): all existing HWRNG are relatively low bandwidth - because they are bound by physical process, rather then endless spinning up of /dev/urandom. They all have to wait for physics to produce each bit, and existing chips don't have that much "physics" in them.
The main novelty factor of "camera noise HRNG" is that we effectively leveraging 12M micro HRNGs in parallel - thats where that firehose of entropy is coming from.
The Intel on-die DRNG generator has a cited bandwidth of about 3 Gbps [1]. On my Raspberry Pi 3, it has an on-die generator via BCM-2835. Single threaded testing with dd(1) shows about 1.5 Mbps bandwidth.
Off-chip, using $25 RTL-SDR dongles, you can get about 2.8 Mbps of bandwidth. In fact, if you look at the Wikipedia article comparing HWRNGs [2], you can see that there are a lot of implementations with suprisingly high bandwidth. Some you'll empty your wallet with, others not so much.
The point is, there are plenty of HWRNGs out there with high bandwidth. Whether or not you trust them though, is a completely different matter.
Here is the problem: most of these stats are not what they pretend to be (unless exact circuit / spec is published). Look at low level details of building say avalanche noise source: http://holdenc.altervista.org/avalanche/ - bandwidth is mostly bound by voltage/frequency/sampling resolution - how often you can trigger entropy event and how many of them in parallel? True result for one AN circuit: 2000 bits/sec.
What a lot of these vendors do is have some physical phenomena on the chip that feeds hardware "whitener" (endless hashing) that responds without blocking to all requests. That's practically hardware version of “/dev/urandom" that is bound only by chip IO - but its completely disconnected from bandwidth of actual “true” entropy phenomena underneath. of course it is still good CSRNG, but its not “true” source. btw nice exception: TrueRNG team are pretty honest providing direct schema - hence the real entropy speed of 40kb/sec.
In short every single entry on that list should be independent inspected down to specs and schema of whitener. If they are not publishing chip spec with exact details I highly highly doubt the bandwidth of “true” entropy events are really approaching GBps - this is the speed of whitener, not of actual generator.
btw I have RPI3 too, thanks for mentioning. it would be fun project to figure out can we reach true source on that chip. but 1,5Mpbs? highly suspect. they don't have space nor clock speed to sample so many true events on that tiny chip. need to dig into the Broadcom spec to find out more!
A question for physicists - how do we know that some process in the nature is truly random and conforms to some ideal probabilistic distribution? The link between math and real world is not clear to me. Thanks!
RLY? How on earth can a read from /dev/random be non blocking if it does not have sufficient entropy?
My understanding is that /dev/random "promises" to return decent random stuff, /dev/urandom returns decently algorithmically generated random stuff. urandom is good, random is better but finicketty and can sulk for a while.
> RLY? How on earth can a read from /dev/random be non blocking if it does not have sufficient entropy?
In short, to answer your question, I am not aware of any /dev/random that will not block if not sufficiently seeded with enough entropy. Linux is the only kernel where /dev/random always blocks when the input pool entropy estimate is low. This is highly criticized in most security communities.
Mac OS X and iOS have implemented the FreeBSD CSPRNG, of which /dev/random is a symlink to /dev/urandom. The CSPRNG blocks on boot until it's sufficiently seeded with hardware timing events. Once seeded, it never blocks again.
On NetBSD, /dev/random sometimes blocks, although not as notoriously as Linux. However, it will fully block on boot until sufficiently seeded, after which it no longer blocks.
OpenBSD also provides no functional difference between /dev/random and /dev/urandom. Both will block until sufficiently seeded.
As far as I know, on every Unix-like operating system, data is saved from the generator to disk on shutdown, and on boot, that data is read into the CSPRNG as an unpredictable seed. With every GNU/Linux operating system I can think of, this happens as part of the install, so on first boot, the kernel is already sufficiently seeded with random data.
On FreeBSD, this seed is saved to "/var/db/entropy-file". On GNU/Linux, either "/var/lib/systemd/random-seed" or "/var/lib/urandom/random-seed" depending on whether or not you're using systemd. I'm not sure of NetBSD saves a seed to disk or not, as it's been many years since I've last run NetBSD seriously.
> ---
Cryptographers are certainly not responsible for this superstitious
nonsense. Think about this for a moment: whoever wrote the /dev/random
manual page seems to simultaneously believe that
(1) we can't figure out how to deterministically expand one 256-bit
/dev/random output into an endless stream of unpredictable keys
(this is what we need from urandom), but
(2) we _can_ figure out how to use a single key to safely encrypt
many messages (this is what we need from SSL, PGP, etc.).
For a cryptographer this doesn't even pass the laugh test.
--- <
There can be a few pitfalls here. Assuming that the dark field image of the Apple camera is actually random noise and not some property of the sensor that can vary between images.
Getting the distribution of the randomness can be hard.
you are quite correct about this - "lens closed" mode has least amount of entropy by our measure. However its clearly proportional to intensity of light hitting the sensor (and beside natural photon variance of light itself, the shot noise in sensor increases with more light). So in what we call "optimal" conditions - enough light to generate noise, not enough to oversaturated there is plenty of natural entropy coming from the sensor.
This reminds me of SID, the custom sound chip on the Commodore 64. It was a hybrid digital/analog chip, so the sound which it produced wasn't consistent between chips due to thermal entropy effects. El33t h4x0rz took advantage of this by making the SID output a bit of white noise into a CPU register rather than the speakers in order to get true random numbers.
Alexandre Anzala-Yamajako posted interesting comments on this to [Cryptography] (@metzdowd.com):
> IMO a statistical approach based on taking a bunch of data a saying essentially "I don t see any signs that it s not random" is not a good approach for entropy seeding. The example is old but I could give you the output of an AES in counter mode with a null key and a null iv and no standard statistical test woud ever show you any defects while you have absolutely no entropy.
> You case is particularely worrisome for several reasons
1) you use a von neuman like extractor but you have also shown that your data is not only biased but also correlated
2) you don t seem have a model of your hardware source from which you could derive the output distribution
3) you do some wizardry to remove some correlation but nowhere show or prove that there isn t more corrolation to be taken care of or how
4) I didn t see in your document a justification of the fact that the manufacturer of the camera (soft and hardware) doesn t have more information than you and could therefore target defects in your entropy management procedure.
> You should have a look at the work of Viktor Fischer, David Lubicz, Florent Bernard and patrick Haddad. They invested quite a bit of effort to give entropy guarantees when using very specific hardware device.
Skibinsky subsequently responded:
> Alexandre, thanks for reading and suggestions! I will certainly check out your references.
> As it is probably obvious from the essay-style narrative, this is not intended to be a tight scientific paper, just our research log of first order ideas we coded up for minimal working prototype. You are correct on #1,#3 - current codebase doesn't addresses these issues. #2 is interesting, because besides wide variety of camera hardware that model should reflect, iOS camera parameters present us with an opportunity to create optimal hardware source. This is far from our area of expertise, so I hope somebody in open source community will pick it up from here and figure out both formal model and what physical settings will optimize the source.
> Thanks again for great suggestions, I will further emphasize impact of correlations & VN sensitivity to non-IID in final section.
> Most likely practical direction of course is simply use universal hash extractor instead of VN, since it relaxes a lot of requirements.
This is really cool. The fact that I can, basically, carry around a TRNG in my pocket is like, the ultimate nerd. Other than reseeding /dev/urandom, I don't really have any personal need for it, but the discussions that can be generated from it could be very interesting.
my motivation from a while back was it would be so useful to have something that you can dump few megs of independent entropy into your /dev/urandom right before generating BTC/ssh keys. TrueRNG stick is also good, but just 40kb/sec (i have TrueRNG on cron job reseeding every hour anyway)
Do you have any references of existing code or research that deals with correlation issues? We considered few home grown ideas (like measuring correlation level in each sample and then compacting a sample by that % before quantizing) but all of them were pretty computationally heavy...
> Take two video frames as big arrays of RGB values.
> Subtract one frame from another, that leaves us with samples of raw thermal noise values.
> Calculate the mean of a noise frame. If it is outside of ±0.1 range, we assume the camera has moved between frames, and reject this noise frame.
> Delete improbably long sequences of zeroes produced by oversaturated areas. For our 1920x1080=2Mb samples and a natural zero probability of 8.69%, any sequence longer than 7 zeros will be removed from the raw data.
> Quantize raw values from ±40 range into 1,2 or 3 bits: raw_value % 2^bits.
> Group quantized values into batches sampled from different R,G,B channels, at big pixel distances from each other and in different frames to minimize the impact of space and time correlations in that batch.
> Process a batch of 6–8 values with the Von Neumann algorithm to generate a few uniform bits of output.
> Collect the uniform bits into a new entropy block.
> Check the new block with a chi-square test. Reject blocks that score too high and therefore are too improbable to come from a uniform entropy source.
This reads like a highly ad-hoc process with nothing resembling a formal justification for any of its steps; nor its general outline, nor any of the magic numbers used in it. It's unclear what properties are being achieved and how exactly the steps guarantee those. There is no analysis of the predictability of the data by an adversary, either.
What does this get you that SHA512'ing the entire raw image bitmap doesn't? Using statistical tests makes sense to verify that the camera data isn't pathologically anomalous (say, all zeroes or all 255), but I don't understand why this sort of procedure is preferable to using a strong hash function to extract randomness from an image sensor's output.
you are completely correct - as I mentioned in the end this would be easier extractor: "Want to relax IID assumptions and avoid careful setup of a non-correlated quantizer? Easy — use any good universal hash function. The only new requirement is that the universal hash will require a one-time seed. We can make an easy assumption that the iPhone local state of /dev/urandom is totally independent from thermal noise camera is seeing, and simply pick that seed from everybody's favorite CSPRNG."
The main reason we went with VN instead of SHA or universal hash is just was more fun thing to build/experiment with. SHA is like flamethrower that will deal with anything you throw at it. VN is far more brittle and you see all mistakes in scenery or generation. Of course then you hash the output anyway before use!
Also, I don't think you'll ever see Cosmic Ray interactions last more than one frame... the time scales for cosmic ray interactions are in nanoseconds, not ms. I suppose electrons could get trapped somewhere and diffuse slowly (that does happen in some CCD's, I don't know enough about CMOS sensors to know if that can happen), but that would happen with bright images in addition to cosmic rays.
In such a small sensor, cosmic rays should be relatively rare. To first order, you can model the number of cosmic rays you see as a Poisson process, which should be a pretty good source of entropy. You can even estimate the rate based on altitude / sensor orientation, etc. However, nearby cosmic ray detectors will detect correlated cosmic rays (from extensive air showers from the same primaries), which might hurt.