Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is a clever and simple way to dredge up some great posts.

I wonder if you could ensure fewer false negatives (i.e. find even more great stuff) by doing the opposite: attempting to filter out every post whose link is to a page that came into existence within a month of the post's submission date.

This would likely require scraping the source links (unless you can get that from the https://cloud.google.com/bigquery/public-data/hacker-news dataset or somesuch), but it might be worth it anyway. It'd literally be "Hacker News, minus anything that looks like News."



I wonder how would you determine when a page came into existence.


Someone mentioned the archive.org/wayback API. You could check the oldest archive.org/wayback snapshot is over a certain age.


google certainly has some approximation of that, for example




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: