Have you guys thought about scaling this? I'm in the process of doing preprocess...

mre · on Oct 6, 2015

Nope, did not think about it yet. Actually it will be running on my Raspberry Pi in the end. But if I had to scale it, I would probably try putting tesseract into a docker container and start N instances with docker-swarm. The output would go into a shared local volume or even S3. Should be more than enough to scale to a very large cluster.

mre · on Oct 10, 2015

BTW, you might want to have a look at https://github.com/tleyden/open-ocr