Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That works when you know what it is you're looking for.

Google still wins when you don't know the word/term/name etc for something and are describing it. The reason for that is mostly that every time someone searches on Google, they help make Google smarter. Effectively every time you search, the results are largely based on what someone searched for historically and what they clicked (or even subsequent searches and clicks).

e.g. "movie about kidnapped daughter" (I'm looking for the 2008 movie Taken, but have "forgotten" the name).

Google: 3rd result. Bing: 8th result. DDG: 7th result.

Why is Google 3rd and DDG/Bing 7th and 8th respectively? They likely have all indexed the Wikipedia and IMDB pages for the movie Taken. However Google likely ranks those links closer to the top because historically that's what users want when they searched for that phase.



Serious question though, why can't DDG do the same thing without compromising privacy? It would seem like it'd be possible to keep a log of things like "A user searched for 'movie about kidnapped daughter' and clicked on 'Taken'. Maybe weight that one higher next time" -- without keeping user details at all.


As an example: One really useful data source that I'm sure Google uses heavily is query reformulations. If a user does a search for Q, and then later does different queries Q' and Q'', and finally clicks on a search result, that's evidence that the result was actually relevant to the original query Q -- the search engine just wasn't smart enough to return it the first time around.

By itself, one data point like that is extremely weak evidence; the later queries might actually have been completely unrelated. But that's a source of error that tends to disappear when averaged across a large number of different users. In the aggregate, the data can be extremely valuable. But doing that kind of analysis requires correlating multiple searches from the same user, and storing the resulting profile for a long enough period of time to do useful aggregation.


Can't DDG use session-only cookies to link Q, Q' and Q"? Or would that be considered a breach of their privacy rules?


You can't count on a browser session to be particularly short-lived. Even if you don't tie query logs directly to a username, the mere act of correlating different queries from a user inherently compromises privacy, as demonstrated by the AOL leak.


I think this kind of issue involves DDG's AI. Google's is far superior in this respect. Heck, G has gotten me to things that I faintly remember but I know, without a shadow of doubt that I'd still be uncertain of 'that thing' I was trying to remember.

This is a search problem. I honestly don't see why DDG can't develop an AI that understands cultural and language contexts. Currently it doesn't but innovating this will take a lot of brain power, something DDG is lacking (when comparing the sheer #s of personnel focused on search).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: