Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Have you guys thought about scaling this? I'm in the process of doing preprocessing using opencv and then OCR using tesseract. Both on python. Since OCR uses I/O I'm not sure how it will scale. Just wondering if you guys have put any thought into that.


Nope, did not think about it yet. Actually it will be running on my Raspberry Pi in the end. But if I had to scale it, I would probably try putting tesseract into a docker container and start N instances with docker-swarm. The output would go into a shared local volume or even S3. Should be more than enough to scale to a very large cluster.


BTW, you might want to have a look at https://github.com/tleyden/open-ocr




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: