Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Agreed that "simply" scaling up with more compute will result in progress and useful systems, and work in that direction is interesting and valuable. But, while we may not need new architectures or training objectives to make progress, we do need them to approach human level sample complexity.

Yes, agreed. Nothing I said above contradicts that! :-)

> Humans don't need to read through 40 GB of text multiple times to learn to write.

Yes, that's true... but to keep the comparison fair, note that we do need many years of schooling to learn to read, say, at a high-school or college level. And before learning to read, we first must learn to speak, which surely helps. And we also get to inhabit bodies that see, smell, touch, and interact with the physical objects that we read and speak about during our formative years, which also helps. The more one thinks about it, 40GB of data is actually a tiny figure in comparison to the amount of training data that flows continuously to our brain from all senses. I think I read once that our brains process on the order of 10 to 100 GB of training data per second.



Conscious processes are estimated to work on the order of 10^2 bits. Vision, at the retina, is estimated at 10^7 bits/sec. It drops another order of magnitude by V1. Also note that long as they're not isolated, a deaf and blind person has no trouble getting to full human reasoning ability despite being vastly more impoverished in available data compared to the average person.

A human will also be learning vision, hearing, walking, physics, causal reasoning, and much more. This comparison just isn't well grounded. Task specific is how much training does a young brain require to learn to produce language? If the brain comes with innate advantages then rather than be resort to inefficiency and excusing our models, we should try to see if they can be bettered.


I'm not well versed at all in signal theory, so I'm genuinely curious how these bitrate estimates are made, and would love to see the source of these specific numbers.

How do you estimate the effective (digital) bitrate of an inherently analogue system?


After 40 GB of text the model doesn't know anything about how the world works, and it shows many times in the examples. Nobody would do some of those mistakes, not even young kids. Other mistakes are more subtle but still show a total lack of understanding.

Then yes, it's enough to write text that nobody really cares about and that could cover a lot of what we read.


It's because nobody dumps 40GB on kid. Kids go through long process of feedback-corrections. I imagine if there would be some crowd funded project to provide feedback about mistakes to this model, it would learn and produce better results fast.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: