Hacker Newsnew | past | comments | ask | show | jobs | submit | laito's commentslogin

Hey, this is pretty cool. I actually tried something similar. (Keeping a list of shop names and matching it with tesseract's results) I was trying hough transform for slight image rotations. I wasn't aware of imagemagick's textcleaner script. That could have save me a lot of trouble :) I got roadblocked by the problem of having various kinds of receipts with absolutely no layout in common. I figured it would need a lot of training for the system to have a decent accuracy and left it for another day.


Yeah, relying on the layout never worked for me as well. For instance even the supermarket address is not standardized. Some shops use:

> Market name > Examplestreet 12 > 19393 Examplecity

while others use:

> Market name > 19393 Examplecity > Examplestreet 12

We're not even talking about the invoice/receipt layout. It's different all the time.


Cool. We did a similar thing at an hack, integrated with Dropbox and an automatic monthly receipt generation. I think most if not all the code should still be in pieces on our github accounts


Argh. I should not comment from the phone. We had similar problems using the layout. In the end we mainly focused on getting the date and the total correct by checking parable pieces of text


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: