Hacker Newsnew | past | comments | ask | show | jobs | submit | avibhu's commentslogin

Ironic name.


National bird of Ukraine


Have you tried few shot prompting? Something on the lines of:

User: Extract x from the given scanned document. <sample_img_1>

Assistant: <sample_img_1_output>

User: Extract x from the given scanned document. <sample_img_2>

Assistant: <sample_img_2_output>

User: Extract x from the given scanned document. <query_image>

In my experience, this seems to make the model significantly more consistent.


For highly consistent responses, manually transcribing the most challenging page of the document (or engaging in multiple rounds of dialogue with Claude) and incorporating it as a few-shot example can dramatically improve overall consistency.


For what its worth, very high quality OCR from Google's Vision offering costs $0.0015 per page, with 1000 free pages per month. In my experience, it has been signficantly superior to any open source solution.


Why this over Document AI?


Thanks!


Can you share that string please?


Tangential: you can finetune something like flan-ul2 to do quote extraction using examples generated from chatgpt. If you have a good enough GPU, it should help cut down costs significantly


Nice, that sounds like it's worth exploring. Much appreciated.

Again though, it's the zero-effort part that's appealing. I'm on a very small team and getting that to close to the same standard will take time for a ham-fisted clod like myself. Worth giving a shot all the same though, thanks again.


The zero shot ability is convenient. But for tasks that you need to get done millions of times, I’d much rather spend $10 on GPU compute and maybe a day of training data generation to train a T5 which I then “own”.

Also, running your own specialized model locally can be much faster than using someone’s API.


Sure, purely a time issue for me. I'm not the most skilled in this area, and I've got a load of core stuff I need to keep on top of.

I think we're not far off having something equivalent that can be pulled from Huggingface and run on a near consumer grade GPU.

For now, I'll hang tight and see how things progress. Don't disagree.


Maybe one day you’ll be able to tell ChatGPT what kind of model you need and it’ll automatically select the right architecture, gather the training data, and commission the training using the cheapest and/or fastest provider. :)


It's interesting what you can do with ChatGPT with few shot learning. It generalizes at the drop of a hat, often correctly.


Don't they have in the ToS you aren't allowed to use outputs for training downstream? Which is a little ridiculous, considering it's ToS.

But yea, they cheap cost and lack of training is making me a take a long hard look at how I'm implementing more traditional NLP solutions.


> Don't they have in the ToS you aren't allowed to use outputs for training downstream?

you mean this? "Data submitted through the API is no longer used for service improvements (including model training) unless the organization opts in" https://openai.com/blog/introducing-chatgpt-and-whisper-apis


was referring to "(iii) use output from the Services to develop models that compete with OpenAI; (iv) except as permitted through the API, use any automated or programmatic method to extract data or output from the Services, including scraping, web harvesting, or web data extraction;" ~ https://openai.com/policies/terms-of-use

I think I missed the exception for API, how ever not sure where they are, but seems to be fine based on alpaca. Also interesting they are so hard on web scraping and and extraction, lol. But wow, that is a poorly worded paragraph.


I do this. It works.


Can you elaborate? Did some brief Google searching but had issues putting it together. We have thousands of documents and data stores we'd like to parse using GPT-3.5 (or the new ChatGPT API) and have been thinking of pretraining to cut things down. Thank you!


contact me at the email in my profile


Not sure why there no direct replies to this, but 108 is a dedicated line for all emergencies. Interestingly 112 and 911 have also worked for me in the past when butt dialed.


Cell networks have a special "call emergency" code, which doesn't actually call a number at all; phones are required by the (ETSI) standard to translate dialling 112 into the call emergency code, and also to translate the national emergency number in the country they are in. Mostly, they solve this by treating 911 and 999 as the emergency number everywhere.

This is so the cell network can prioritise emergency calls and also will allow them to be made even when other calls are blocked (e.g. if you have no credit, or you only have signal on another network, or even if your phone is blocked by IMEI because it was reported stolen).

Landlines are different, they do treat emergency calls as a call to the emergency number and whether they will reroute another country's emergency number to the one where you are will depend on the specific exchange you are connected to (e.g. in the UK, 999 will work on all landlines; 112 on almost all except the oldest analogue exchanges and 911 on most but missing some older exchanges).


Do you support custom domains?


I had the same problem with a few old videos in my favourites. Google search for the alphanumeric text after "watch?v=" in the URL of the video. In most cases, you will find some information about the video from pages where it might have been embedded.


In a an image with sufficient contrast between the foreground and the background, thresholding and using the fast radial symmetry transform[1] should do the trick. I have some really old code that I wrote a few years back that does something similar. I was able to use the same algorithm for counting objects in images captured from a Neubauer chamber [2] and saved countless man hours at my university.

Disclaimer: the project is really old, and from a time when I barely knew how to code. Lots of bad coding practices et al.

Github: https://github.com/vibhuagrawal14/segmentation-of-overlappin...

[1] https://link.springer.com/content/pdf/10.1007%2F3-540-47969-...

[2] https://www.researchgate.net/figure/Images-of-Canis-familiar...


Would a solution like that work for a video feed where you need to make sure you're not double counting as objects move along?


> NOTE: the discoverer states "this vulnerability has no real-world implications."

Not sure if declaring this is standard practice, but I had a good laugh.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: