Weird, yeah. Seems like a roundabout way of trying to preemptively answer the "why not deep learning?" question omnipresent among ML newcomers. The bits identified aren't really wrong: you could argue that gradient boosting's comparative strength is that it works well (often out-of-the-box, with little tuning) on structured data sets, including relatively small data sets. Hence the good performance on Kaggle-type problems, whereas deep learning is ahead in audio/text/image/video data; and hence the lack of gradient boosting being used on ImageNet-type problems.
But these points all belong in some section entitled "why use gradient boosting instead of another ML method?", not in a definition of gradient boosting.
(As for the lay-man description: I thought boosting performed better out-of-the-box on dense data than on sparse data, because most feature sub-selections for bagging are on zero'd features)
But these points all belong in some section entitled "why use gradient boosting instead of another ML method?", not in a definition of gradient boosting.