*> Agreed that "simply" scaling up with more compute will result in progress and...

Cybiote · on Feb 15, 2019

Conscious processes are estimated to work on the order of 10^2 bits. Vision, at the retina, is estimated at 10^7 bits/sec. It drops another order of magnitude by V1. Also note that long as they're not isolated, a deaf and blind person has no trouble getting to full human reasoning ability despite being vastly more impoverished in available data compared to the average person.

A human will also be learning vision, hearing, walking, physics, causal reasoning, and much more. This comparison just isn't well grounded. Task specific is how much training does a young brain require to learn to produce language? If the brain comes with innate advantages then rather than be resort to inefficiency and excusing our models, we should try to see if they can be bettered.

EForEndeavour · on Feb 15, 2019

I'm not well versed at all in signal theory, so I'm genuinely curious how these bitrate estimates are made, and would love to see the source of these specific numbers.

How do you estimate the effective (digital) bitrate of an inherently analogue system?

pmontra · on Feb 15, 2019

After 40 GB of text the model doesn't know anything about how the world works, and it shows many times in the examples. Nobody would do some of those mistakes, not even young kids. Other mistakes are more subtle but still show a total lack of understanding.

Then yes, it's enough to write text that nobody really cares about and that could cover a lot of what we read.

riku_iki · on Feb 15, 2019

It's because nobody dumps 40GB on kid. Kids go through long process of feedback-corrections. I imagine if there would be some crowd funded project to provide feedback about mistakes to this model, it would learn and produce better results fast.