Well... Not really... They split the MNIST data set and trained on disparate hal...

Well... Not really... They split the MNIST data set and trained on disparate halves. Which is to say I wouldn't generalize from two networks trained on far less than 10x their parameter counts all the way to all neural networks in existence, but of course, your opinions may vary...