Well... Not really... They split the MNIST data set and trained on disparate halves. Which is to say I wouldn't generalize from two networks trained on far less than 10x their parameter counts all the way to all neural networks in existence, but of course, your opinions may vary...