This is not the core problem of search. The problem with search is that nowadays, google rarely finds what I'm looking for. When I search for the same terms I used to search for earlier, google always shows me 'likely' results. I.e, if I search for example for 'window handle', a few years ago, this would almost certainly show a bunch of programming results. Nowadays, the search result is very different, and a lot less relevant to what I'm looking for.
This is where search has to be fixed. When I type window handle, I want a HWND. When a builder types 'window handle', his first result should be a local store where he can buy window handles or descriptions of window handles.
That's where search has to be fixed, not in crawling.
Not only that, but Google tries to be too smart. Years ago, when I searched for "x y z", google would only return results for "x y z". Now it returns results with things resemblying x, y, and z, most of which are NOT what I was searching for.
I used to be able to craft a search query that would be able to cut out the fuzz from the search and give me the specific results I wanted, but now Google is working very hard to bring that same fuzz back in.
Sorry, until computers can read my mind, trying to make your program guess at what I meant to say means you've already lost.
I think that Guy L. Steele said it best, although he was talking about DWIM (Do What I Mean) in Lisp implementations:
So to this DWIM
Let's say farewell;
The crocks therein
Prove it can't win
And ring its knell:
(Google 'A Time for DWIM' to find the rest of it, although it's not quite as relevant to my point as this stanza. And look at the rest of GLS' poetry/songs too -- very amusing)
Pinging/Pushing is a feature and if it can generate better results Google is best positioned to implement it by combining that data with their existing index and infrastructure.
How is gnip (as described here anyway) different from the various ping services that appeared in the 2001-2004 timeframe, all of which failed to displace google?
But Google is already a hybrid model. The article even mentions AdSense, but completely ignores the Toolbar, Analytics, Ajax APIs, etc. etc. etc.
Push or pull, Google is absorbing URLs to crawl and index at a rate which no start up could match. Eventually, someone may dethrone Google, but it certainly won't be a tiny startup.
Also, there is a significant portion of the web that could never be educated enough to ping some server. This approach is doomed to failure without a Crawler. That's not to say that there might not be interesting applications of their technology is news analysis or aggregation.
I really think the idea of pushing/pinging search engines to come get content should really be linked to the sitemaps protocol if anything, to me that seems the most logical solution.
Basically, your main sitemap file would consist of a sitemapindex which then links to at least 2 more separate files. The first being your recently updated list that gets flushed when a search engine hits your site, and the rest containing a full index of the content on your site.
Pull or push, a good search engine still needs a local copy of the web, and this is expensive. With a spider, search engines control in detail how they get content. With push-via-notification, websites get much of this power. Is this something that Google wants to give up?
Finally, if you're going to turn off your spider, /everyone/ better be pushing to you. So this seems more likely to be next-next-next generation.
I don't quite understand Gnip, but Brewster Kahle's WAIS has/had each machine offering its own results from its own personal database.
I believe Gene Kan (RIP) also had this in mind with his Gnutella work.
I also wanted to build a system based around Gnutella, searching text documents across a whole P2P network, but nobody in the Gnutella community wanted to adopt my ideas.
Intersting article that first states a problem (bad search ranking on heavily SEO'ed terms) and then proposes a "solution" (push data to google rather than google pulling data from the site) that doesn't have anything to do with the problem.
Push-to-google might be useful for providing more up-to-date search results, though.
Contrastingly, if you subscribe to a blog, you get pushed a notification whenever that blog is updated
Err. No. You have a RSS client which regularly pull. It's exactly the same thing. Either the guy has no idea what he is talking about, or he is really bad at making examples.
In a Gnip world, every website would have a feed – whenever content changes – the index gets pinged.
Right. So it was probably just a bad examaple, even though I have my doubts. Still doesn't google more or less have this feature trough it's webmaster tools?
This is where search has to be fixed. When I type window handle, I want a HWND. When a builder types 'window handle', his first result should be a local store where he can buy window handles or descriptions of window handles.
That's where search has to be fixed, not in crawling.