I wonder how much more accurate this solution is compared to a simple IR beam across the entry. Is the extra HW and SW complexity needed to implement a computer vision solution worth the increase accuracy? I suppose the answer would depend on the use case. Mom-and-pop shops might only need an IR beam, if that, while megacorps that optimize costs down to the penny might have use for exact customer trends so they can decide how much human workforce and/or automation they need.
Basic IR beam solutions cannot tell the difference between someone going in the store vs someone going out, so keeping a real-time count of occupancy is difficult (or impossible). They also struggle with two people going through a doorway at the same time (counted as one entry vs 2).
A two beam setup would allow differentiation between entering and exiting of single clients. Then the question comes down to how often do 2+ people cross the beams at once, and does that have enough of an effect on the aggregate statistics, if it was taken into account (e.g. count each crossing as 1.1 people, if 10% of beam crossings are 2 people)?