There was a time when search engines were a thing, and it seems they still are

jacquesm · on July 2, 2018

To me the main way of determining search engine quality is whether or not it can find pages that I know exist because I've read them recently (they're still in my history) based on the terms that to me make sense for that page.

By that metric there are no good search engines at the moment and the older the pages the worse this effect gets. It's really nice to see Google do lots of 'moonshots' and interesting tech demos but I'd be far happier if they fixed search and kept their focus on that.

If a page doesn't show up in either Google or Bing for sensible queries then that page effectively ceases to exist. The perverse incentive that these companies have to avoid you going to a page with relevant results as long as you spend more time on pages with their advertising on it ensures that more and more content will end up missing in action.

I'd be happy to pay for a search engine that:

- actually really works

- also allows you to search past page 10

- has a working API with reasonable limits

InclinedPlane · on July 2, 2018

Yup. Google has been pretty awful for the past, oh, 5-10 years or so. They used to be a high quality search engine, a way to index the depth of material on the internet. Now they are just a semantic front-end for the most popular content on the internet. Do you want to find something from wikipedia, youtube, medium, the new york times, amazon, etc? Google does great. Do you want to search for something that thousands of other people also search for routinely? Google does great. This role is also the easiest to monetize for google (through promoted links). But if you want to search for something highly technical or very specific, google is now terrible, in fact it's worse than it used to be 10 years ago.

wruza · on July 2, 2018

I’m attributing the reason for this to the mass of mobile/social users who changed the search market. Most people from these groups search for naturally popular things like Saylor Twift [legs] or whatever it is in the trends right now. I wish we had unpopular search engines that suck at pop. I also miss directories – while not fully complete, they gave a good overview on technology sections and many more that you cannot just google since you don’t know it exists. Before the internet my family had a big encyclopedia collection that I as a kid occasionally opened/skimmed and read about something new. It is not possible with wikipedia and the internet since it is now overwhelming and has no good place to start anymore. Our average attention volume is so narrow (relative to amount of information) that it became a product. Also, I miss the days when you had to investigate a topic, make yourself fluent in it and enter ‘the club’ of highly interested people. Now anyone can google a shallow pop-info on anything and pretend to be educated in it in minutes. That degraded many good groups as a result.

bryogenic · on July 2, 2018

Technical topic directories are alive and well!

https://github.com/sindresorhus/awesome

mda · on July 2, 2018

Maybe problem is you do not have records of your answers 10 years ago, how do you compare the quality? Gut feeling?

InclinedPlane · on July 2, 2018

Experience, and being able to see directly the different behavior of the search engine. I watched as it happened. Google used to be optimized for producing the minimum number of results for your query, biasing towards specificity. This worked great for tech savvy folks who knew how to craft searches with a high degree of specificity. Then they switched to biasing towards a higher number of search results, biasing towards correcting your search to match some other popular search. It's so bad now that google will just completely drop words from your search as terms in order to present the results it "thinks" you want, and then you have to go out of your way to force it to actually care about those search terms (and this isn't because the more specific / restrictive search has no results, it just has results that google doesn't "like" as much).

You can craft a google search with greater specificity but it's very difficult to obtain the sort of search behavior that google used to have. Now google treats your search terms as sort of a grab bag, it mutates them into a cloud of synonyms and related words, then it picks a subset of the grab bag that it decides is valuable and gives you results that are tuned by about a zillion arcane heuristics. This works great for giving you "magically" accurate answers for the most common search queries. It works terribly for giving you highly specific answers to highly specific queries. The way google used to work was by providing results that matched all of your search terms, and being smart enough to include different variations of each word but not vaguely related words. That sometimes made it hard to find the right thing if you didn't get the right words but now we're in a state where you can't find the right thing even if you do have all the right words.

repsilat · on July 2, 2018

> they're still in my history

Funny, Google has some perverse incentives here: it might be nice to have a good "history search" built into a browser, but as a search engine provider they won't build it into Chrome.

Now that we live in the future I guess you need never "clear your cache" except maybe for privacy reasons. You could keep full page text for just about any site you visit (so long as the authors don't consider their site a "web app" I guess.)

throwaway2048 · on July 2, 2018

History search works fine in firefox.

mwest · on July 2, 2018

It only searches the URL, title and tags of the page though. It doesn't return results based on the content, which I believ e is what OP is after.

sametmax · on July 2, 2018

And is not invading your privacy as it stays local.

I miss this feature all the time on other browsers.

wffurr · on July 2, 2018

What

Chrome://history

I get history results when typing into the address bar.

swsieber · on July 2, 2018

Have you used it though? Just because it exists, doesn't mean it's good. I've tried usin it several times, and it sucks.

repsilat · on July 3, 2018

Doesn't seem to do full-text search for me. Maybe you have a newer version of Chrome than I do though.

elorant · on July 2, 2018

So what "really works" boils down to? Would it include ranking or not, and if so based on what? How about fighting keyword spam?

TeMPOraL · on July 2, 2018

One component I would include is weighing a site down proportionally to amount of ads it loads, as in my experience it anticorrelates with quality and trustworthiness of the site.

reality_czech · on July 2, 2018

Yahoo hasn't had its own search engine for years. In 2010, they became essentially a frontend for Bing. In a later 2015 deal they switched the backend to using Google.

Duckduckgo is a metasearch engine, technically, but mostly it delegates to Bing.

As far as I can tell, there are only two and a half real search engines that still exist: Bing, Google, and Wolfram Alpha. (I count Alpha as a half because it's not really what most people are looking for.) I'm curious if anyone else knows of other real search engines still in existence.

cobookman · on July 2, 2018

Yahoo still uses Bing for majority of their searches. https://arstechnica.com/information-technology/2015/04/micro...

DrScump · on July 2, 2018

Check the date on that again; it doesn't contradict the parent comment.

cobookman · on July 2, 2018

https://searchengineland.com/yahoo-google-search-deal-233963

Unless something has changed it seems bing still gets at least 51% of traffic.

lqdc13 · on July 2, 2018

I just tried hotbot and it seems to be better for programming-related questions than duckduckgo.

I wonder if they use mostly Google for the backend.

anothergoogler · on July 2, 2018

Baidu, Yandex, Naver, probably quite a few others.

mda · on July 2, 2018

What percentage of traffic comes from Bing though?

hawkice · on July 2, 2018

Woah woah woah Duckduckgo delegates to Bing? If many/most searches are unique that means they can't live up to their privacy statements.

zaptheimpaler · on July 2, 2018

If they hide the users IP, HTTP headers etc. and proxy searches to Bing through their own servers it would be anonymous.

fragsworth · on July 2, 2018

Unless your query leaks private information.

Or a series of unique queries could leak private information.

jstanley · on July 2, 2018

Bing would be unable to associate series of queries with users.

As long as DDG are doing it properly (and I believe they are), Bing would only learn that the contents of each individual query are associated together, they would learn nothing about which other queries were performed by the same user.

tedsanders · on July 2, 2018

I think the concern isn't necessarily that Bing would associate query X with person Y. The concern is that Bing would even know that query X exists. For example, if Bing saw a spike in searches for "Aramco IPO July 4, 2018" and were to reveal it to a human or store it, that might be a serious leak of non-public information. Many searches reveal private information, even when they aren't associated with a user.

Sean1708 · on July 2, 2018

> if Bing saw a spike in searches for "Aramco IPO July 4, 2018" and were to reveal it to a human or store it, that might be a serious leak of non-public information

Maybe I'm missing something obvious here, but how is that any different from Google or DuckDuckGo seeing the same spike?

sammorrowdrums · on July 2, 2018

Well you might trust DDG as a good actor but not a third party. To discover that this information is discoverable to a third party (even if un-attributable) would breach their trust in DDG. Whether that's reasonable or DDG are misleading people in that regard is another matter. Personally I still use them a lot, and will continue.

I just think there is a point to be made here. Even generally it's often opaque what third parties have what data and I don't really think GDPR has fixed that. It's surprising for people the Bing might have the contents of their DDG search history, somewhere in the huge dataset of DDG searches that pass through.

Also they might not want to help improve Bing search but I'm guessing they do inadvertently?

FartyMcFarter · on July 2, 2018

There's no practical technology that would get around that problem.

Homomorphic encryption might do the trick (?), but it's too slow at the moment.

qznc · on July 2, 2018

Intel SGX is only answer at the moment. The Signal messenger uses it, do address book matching is private. It requires the user to trust the server hardware vendor (Intel) instead of also the cloud provider.

jstanley · on July 2, 2018

That would not stop the Bing query matcher (or indeed the Signal address book matcher) from being able to look at the contents of its own secure enclave.

qznc · on July 2, 2018

The trick is that every user uploads his own matcher. The server only sees encrypted matchers, feeds them data and returns the encrypted results. You as a user decrypt your results and nobody (except Intel) was able to see them.

y4mi · on July 2, 2018

thta has little to do with privacy though. and i believe thats the only thing they assure?

lowkeyokay · on July 2, 2018

I think that’s pretty far fetched. Such a spike would most likely be the product of an already very well known rumour.

Vinnl · on July 2, 2018

IIRC DDG delegates the crawling to Bing, but does the actual searching itself.

mda · on July 2, 2018

I really doubt it. They never tell what happens in the background. Probably because it would spoil their "magic".

Kiro · on July 2, 2018

How would you do that with the Bing API? Really don't think that's true.

larkeith · on July 2, 2018

Another half (read: limited-scope) search engine that springs to mind is Shodan.

PhasmaFelis · on July 2, 2018

Did Yahoo ever have its own search engine, technically? In the early web it was a directory maintained by humans, which made sense at a time when the total number of pages in existence on any given subject was no more than a few hundred; I thought that when that era passed they went straight into licensing other search engines' results.

KMag · on July 2, 2018

I worked at Google on search indexing at the time Yahoo switched from their own search engine to using Bing. At the time, by most of Google's own search metrics, Yahoo had a product superior to Bing. If Bing had been spun off as a separate company, or otherwise hadn't had access to Microsoft's deep pockets and default IE search status, it's likely Yahoo would have fared better.

toast0 · on July 2, 2018

I was at Yahoo during that time, although not in web search. From what I could tell, company leadership was frustrated with lack of growth in search market share, and didn't want to invest in it anymore.

Yahoo was running user studies where they would put Google results and Yahoo results side by side but switch the branding; while Yahoo results were ranked better than Google for most of the tested queries, results with Google branding ranked better than with Yahoo branding, regardless of whose results they were.

The plan was to just use Google, but the DOJ (or FTC?) put out guidance that that would be anti-competitive, so Bing was it. This might have worked out anyway, but the expected cost savings from outsourcing search didn't actually happen that I saw, but I left in late 2011, and stopped following closely after that. Web search was also linked with search ads, which Bing did poorly at too.

KMag · on July 2, 2018

Google also ran similar user studies, sometimes between Google and other search engines, and sometimes between production Google and a proposed change.

One tough thing is there isn't one search quality metric. It's important to have the search results page look good with its snippets, and another thing to have people actually look at the linked pages and compare the usefulness of the linked pages.

Common vs. uncommon searches are also important. It's not difficult to write a search engine that badly over-fits on the most common searches. However, for market share, it's important to do well enough on the common searches that users don't leave, and do well enough on tough long-tail searches that you pick up users that leave other search engines on tough queries. The idea is to be pretty good at the common searches, but the best at the kinds of searches that cause people to try other search engines. Naive frequency-weighted metrics will get this totally wrong.

It's also more important to get useful information in the first 2 or 3 links. If Google links to the second-best link at result #1 and puts the best link off the first results page, but Yahoo puts the best link down at #7 and second-best at #8, the user may lose interest before following a really good link.

I don't think Google took the union of front-page search results between two competitors and asked humans to hand-order the (up to 40) pages for how well they fit the query. But, that seems like a good way to test the actual usefulness of search results. You'd probably especially want to keep track of the percentage of the top 3 search results that were filled by top-5 (guessing at 5) useful links.

Anyway, inside Google it was well-known that Yahoo was the competitor to worry about in terms of search quality.

saalweachter · on July 2, 2018

I would note it is possible (and even likely?) that each search engine performed better for its own traffic-weighted query stream.

Endy · on July 2, 2018

Nope, they had a spider-crawler and a full engine along with the human-curated Directory.

wumpus · on July 2, 2018

The sequence was: Yahoo Directory -> Yahoo showing Google results -> Yahoo builds their own search engine -> Yahoo uses bing

anothergoogler · on July 2, 2018

Yahoo search was provided by Altavista for some time as well.

wumpus · on July 2, 2018

Yes, before they used google. It's a pretty interesting story, actually, how Yahoo felt that they should use the best underlying search engine with a "white label" approach, and how Google succeeded in eventually building a very strong brand despite being invisible.

boyter · on July 2, 2018

Didn’t they use inktomi for a while?

carlivar · on July 3, 2018

Yes, and then they bought inktomi.

scruffyherder · on July 2, 2018

Baidu

swsieber · on July 2, 2018

I'd love a search engine that doesn't search content, but instead (edit: primarily) searches for characteristics. For example

- origination date of the website

- number of theme changes

- whether ads are present

- estimated data usage / load time

- general website size

- type of website - personal, company, etc.

- last update time

- presence of javascript

- browser compatibility (well, feature usage detection & browser compatibility inference from that).

Mostly this is just me missing the websites of the early 2000's, and trying to figure out a way to rediscover them.

And I'd probably want content on top of this. (Edit: e.g. search by topics)

Lastly, it'd be nice to restrict things to sub-genres, but I'm not sure. E.g. when I'm doing a search I'd love to reference things related to micro-controllers, and so maybe I'd put in Arduino to get into the realm. Sort of what like google does for you without telling you. (tailoring your searches by some magic context).

A man can dream...

Edit 2: Search engines these days seem to be answer engines, I want a research engine.

flatline · on July 2, 2018

> Mostly this is just me missing the websites of the early 2000's, and trying to figure out a way to rediscover them.

Apparently somethingawful and ebaumsworld are still around in some form, and Slashdot of course. The thing is, these sites have largely been replaced by better versions of themselves. That’s resulted in a lot of centralization into a few sites like reddit, which is a combination aggregator and blogging platform for people who are too embarrassed to attach their real name to what they write, which is apparently a good share of the population. Then there are sites like YouTube, LiveLeak, Facebook, that just offer something that no one could or did in the 2000s. And with mobile and apps, there’s a level of engagement that doesn’t leave much room for a thousand little sites with quirky, regular, custom content.

swsieber · on July 2, 2018

I'm not talking about the large sites though, I'm talking about the small sites. Perhas I should have said feel, and not sites, of the early 2000's.

Right now Google thinks it knows what you want and when you search for things, it returns the same few sites (mostly). You used to come across people's personal sites into which they poured their soul. And while those exist less frequently now, I bet they still exist.

macca321 · on July 2, 2018

I'd like to be able to search for sites that return a 406 containing a specific string so that I could find APIs that implement particular media type standards.

iKSv2 · on July 2, 2018

You mean a search engine which would search content and search for characteristics and show results based on your preferences or am I misreading this?

swsieber · on July 2, 2018

I think that's a correct reading. Basically, take all of Google hidden stuff and make it explicit, along with adding all the other options I listed.

I think of it as a form (as opposed to content) first search engine, less for precise searching and more for general topic exploring.

throwaway2016a · on July 2, 2018

I don't know if this is still the case but when I worked for Lycos 10 or 11 years back they owned Hotbot so including both in the list is a bit redundant.

They also -- despite being one of the first ever search engines -- didn't do their own search in 2008. They outsourced to Yahoo. Though there was an effort at the time to become a search engine again. I don't know if anything came of it.

Edit:

It's hilarious they labeled Lycos as...

> Lycos—is still around!

Because even at the time I worked their the number one response I got from people when I told them that was "They still exist?"

jccalhoun · on July 2, 2018

What was it like to work there? I know it was a decade ago but I am still curious what it would be like to work at these mostly forgotten companies that still manage to exist.

throwaway2016a · on July 2, 2018

Maybe an AMA for another time :)

But the short summary: I loved it. It was a really fun company with a lot of great people. And we got to launch some really great products. Most of whom were let go during the great recession (myself included) but it was fun while it lasted.

On the flip side, every time we launched a product the news media treated it as a novelty instead of a serious thing. Which was insanely frustrating. Some of our tech was way ahead of its time.

tgb · on July 2, 2018

So why aren't there more search engines these days? Google is great but we constantly talk here about how it's losing its edge for certain kinds of more specific searches like technical ones. So seems like there's room for engines that are more tuned for special use cases and the ability to index web pages has only gotten cheaper since Google started. Has the size of the web made this impractical? Or do I just not know about these options?

larkeith · on July 2, 2018

Utter layman's guess: While indexing has gotten easier, the web has expanded exponentially in the meantime - likely far outstripping any technical gains. I would be surprised if indexing the modern web is feasible without significant time and capital.

Also, I can think of a couple specialized engines that do exist: Google Scholar and Shodan. There are probably more I'm unfamiliar with.

fragsworth · on July 2, 2018

Not only is it infeasible, but there's so much garbage out there that we have no simple way to filter it out from scratch to the point that anyone could actually use it. Google and Bing have feedback loops with their users that prevent crap from rising to the top of search results. A plain index (or even using something like pagerank) wouldn't have this huge benefit, and you'd never be able to get your search engine off the ground.

SerLava · on July 2, 2018

The problem is that Google and Bing really are duopolistic.

You'd think that a monopoly could just break instantly if all it required was typing in a different URL.

But modern search engines are reliant on machine learning on mind-bogglingly enormous troves of real human interaction data.

If you truly outsmart Google by inventing a better mousetrap, it's worth fuck-all. Your solution will probably require more usage data than you will ever be able to collect, because nobody will use your search engine while it still produces poor results.

wumpus · on July 2, 2018

This is the conventional wisdom. On the plus side, it means that those of us who want to build innovative search engines don't have much competition!

SerLava · on July 2, 2018

Well, ML on 100 billion searches is probably going to more closely approximate user intent than a living brain in a tank, because that brain doesn't know what the hell "lkw attachment" means.

Looks like german-english bilingual logistics professionals are looking for truck parts vendors, while teenage Americans are looking for hidden locations of laser focusing and enhancement devices within the video game Wolfenstein The New Order.

Do you envision a second route to that answer?

wumpus · on July 2, 2018

Well, the interface offered by blekko's Izik tablet search engine was that it would show 2 categories in the answer, one related to automotive and one related to games.

wutbrodo · on July 2, 2018

Google has done this for a while, but at the level of recognized entities, not at the level of individual queries. Is that how the engine you're referring to worked too? Eg, would the "lkw attachment" example above have partitioned results?

wumpus · on July 2, 2018

In Izik's tablet interface, each category was a separate row of results. So in this example, there would be 2 rows, one for automotive parts and one for games, and if you scroll horizontally in a row you get more results in that category.

I think that's what you meant by partitioned results.

Google computes this internally but I've never seen them use it for anything other than having diversity in their top 10 results.

wutbrodo · on July 3, 2018

Yea it's explicitly separated, just not shown at the same time. For example, if you search for "kings", there will be a couple of bubbles at the top of the page with different entities: "Kings" (2017 film), "Sacramento Kings" (baseball team), etc. Clicking on one of those will show you a list of results that only pertains to that entity. This feature has been around for years, and is part of the series of "things, not strings" features they've been working on.

As I said, Google is pretty conservative about this and other entity-based features, so they definitely wouldn't do it got something like "lkw attachment". My question was whether Izik triggered this feature in such cases or not.

wumpus · on July 3, 2018

That Google feature is like "related queries", that kind of feature has been around for more than a decade. If you click on the "Kings (2017 Film)" link it runs a search for [Kings 2017]... which just adds 2017 as a keyword to a conventional search. No semantic search is involved.

Izik would show you film-related website results for the film category.

scruffyherder · on July 2, 2018

State Monopolies do a good job.

Setup grants for a domestic internet fund to incubate tech, ban competitors and blamo!

Works miracles in China.

The same underpinning is in many Francophone nations.

pyrale · on July 2, 2018

Not sure you'll find many products banned in francophone countries.

It is true, however that many companies don't bother to localize.

scruffyherder · on July 3, 2018

More so that they do have legal frameworks to restrict and ban English speech. They could easily force it online, but oddly enough don't.

It seems they are content with raiding the Canadian film board.

polm23 · on July 2, 2018

I think these exist, they've just become more specialized. Think of these sites you might have seen before:

- alternativeto.net and similar - Google Scholar / Semantic Scholar - Every sandboxed social network (Facebook, Twitter, Tumblr) - A variety of Instagram searching sites - Alternative App stores for Android

And there's room for plenty more sites like this.

Attempting to return a good response for anything in the search box is a bottomless problem of unclear utility. Being more focused makes the work easier and the value for the user clearer.

InclinedPlane · on July 2, 2018

Google jumped into the market when it was still possible to compete. And they came to the table with a product that was faster, better, cheaper, and more easy to monetize. Google-style data centers reduced costs, sharding and map-reduce plus streamlined design improved speed, pagerank improved quality, low cost fast searches meant that low cost advertisements could still bring in a lot of RoI. From that kernel they grew to dominance, becoming synonymous with the very term "search". Now we live in a different era, one where search is integrated in everything on every platform and where replacing the default search engine is a huge uphill battle.

Let's say that someone creates a better search engine, how would people actually use it? I'll tell you how most people who bothered to use it would integrate into their routines. Firstly they would still use google for everything day to day. They would still use google maps for directions. They would still use gmail for mail. They would still use google search as the semantic front-end for their browsing. Only after they performed a search using google that produced unsatisfactory results would they then pull out the better search engine and make use of it for that one isolated search. And that's the problem, because that scenario is very hard for the better search engine maker to monetize while google would continue reaping the major monitization haul for the vast majority of search uses for that user. And going from zero to completely integrating into a user's experience in the same way that google does now is not a realistic prospect for most startups.

auganov · on July 2, 2018

I'd love a search engine with more querying power limited to a niche. But I doubt limiting topical scope would do much to keep computational complexity from blowing up. A while back some startup was charging per search to work around that.

wumpus · on July 2, 2018

blekko provided 2,000 niches and it wasn't enough to keep us alive. Also, you need a web-wide crawl to build a good niche engine.

tgb · on July 2, 2018

I wasn't really thinking a niche topic but a niche use-case. Google is bad at doing literal, verbatim string searches these days and all I want sometimes is like Google 2005ish era search, just PageRank based and no modern machine learning, etc. You can have it simpler than Google did at the time since SEO against your niche engine is unlikely, so tech wise you might be more like 1998 Google. The main bottleneck seems to be what others here are suggesting: there's just too much web to index these days unless you're Bing-scale or bigger.

wutbrodo · on July 2, 2018

> Google is bad at doing literal, verbatim string searches these days and all I want sometimes is like Google 2005ish era search, just PageRank based and no modern machine learning, etc

You can get most of the way to this with the verbatim option, but I think they make it difficult to make the default.

> By using the Verbatim tool Google will not make the following changes: • Personalizing your search using websites you have visited before; • Including synonyms of your search terms; • Automatic spelling corrections; • Searching for words with the same stem e.g. “Shopping” when searched for “shop”; • Finding results that match similar terms to those in your query.

auganov · on July 2, 2018

Well that's sort of what I'm thinking of too. I'm talking about the idea that this might be doable by limiting topical scope. So for example you'd only index some portion of tech related sites.

hartator · on July 2, 2018

I think the transformation has been more subtle, App Stores SEO is a thing, and voice searches are becoming more common. Document searches is still Google though.

prevedmedved · on July 2, 2018

Because Google and Bing are in the advertising business, not the search engine business. (There is no "search engine industry".)

Their goal is ad clickthrough, not accurate search results.

carlivar · on July 3, 2018

Sort of. They need good search results so users return and have a probability of ad click again.

jccalhoun · on July 2, 2018

I think it is hard to get as good as google without spending tons of money and/or time and then get the market share to start possibly make any money.

Bodet · on July 2, 2018

for me the image search by google is broken and I can't find useful stuff with it

scruffyherder · on July 2, 2018

If only things like desktop search became more popular, I prefer the distributed and fragmented model.

As always I can shill for my utzoo early Usenet search that combines AltaVista desktop + a few hacks to make a specialized search

http://altavista.superglobalmegacorp.com/altavista

I know it's niche, but it's great for anything historical from 1981 to early 1991 on the internet

ctrlp · on July 2, 2018

"What is it with these nearly twenty year old sites still up?" Not sure but I believe some of the answer lies with adtech distribution needs. The search ads demand traffic, however astro-turf it be.

bakztfuture · on July 2, 2018

Search is alive and well. I'd recommend reading some of the latest textbooks and research papers on information retrieval. The industry was given new life about 5 years back with knowledge graphs and has been reborn again with recent innovations in machine learning, cloud computing, and data mining technologies.

I'm working on a project now which has indexed billions of pages and answers queries similar to a web search engine like Google: https://www.AtSign.co/

The only difference is that it's a keyword + location based business contact information engine but operates on the same principles as a real web search engine client.

We're a small team and it would have been unthinkable even a years back to launch something of this scale effectively ... But here we are! Amazing space to be in right now

wumpus · on July 2, 2018

People search has been around for a long time, aren't there a bunch of current competitors in the "business contacts" space?

bakztfuture · on July 2, 2018

Many assume you know the name or website domain of the company beforehand.

Also, ours is keyword based. We index the site similar to how Google does. So you can get very specific company matches and then export to CSV.

gregw2 · on July 2, 2018

ugh. compare the results for tech support bridgeport, ct to google. or just look at them without comparing to google. awful! No offense but you aren't even doing the most obvious rule based filtering/ordering on cities/states in your result sets.

bakztfuture · on July 2, 2018

hi Greg, we don't offer filtering by cities at the moment, only states/countries, so, you wouldn't have been able to look up, "Bridgeport" specifically, right? A lot of people punch in a city and hit "search" but what they get as a response are matches from "any country" which is the default. Which is why you didn't see the basic filtering you were expecting.

Regardless, I just looked at our results for tech support in CT and I agree we need to work harder on our results, but comparing to Google, they only had tech support jobs (not even business listings)... which makes sense in their product use case.

I can see what you're saying and Google is by far, the industry benchmark, but it's also difficult to compare results sometimes ... it's like Apples and Oranges.

bakztfuture · on July 2, 2018

I don't want to get too much into the weeds, but there is a whole subset in Information Retrieval which relates to IR system evaluation, or search engine result evaluation. One simple way of doing it is simply labeling the accuracy of each result via human curator as either a 0 or a 1.

But it can get really complicated, for example, sometimes there just aren't relevant documents in the index in the first place ... so, you can't really blame your ranking factors too much. The opposite can happen too, where a word occurs too frequently in which case you might resort to other kinds of ranking factors (most notably pagerank).

In our case, we're focused on broadening our state/country level coverage right now for keywords (more listings), then we're going to focus on making sure our location accuracy is a lot better (it needs work). Overtime, you should get the results you're expecting more often :)

ChuckMcM · on July 2, 2018

All of those are 'meta' search engines. There are three English language indexes of any size available, Google, Bing, and Yandex. All of the other search engines go to one of those three for most if not all of their queries. Some of the bespoke engines have local indexes of things like stack overflow or wikipedia (both fairly easy to index) to save on the cost but all the others use the big three (and mostly big two because Yandex pulled their servers out of Nevada which made their ping time add 300 - 800mS of latency to their searches).

Most of these used BOSS (aka Yahoo!'s old build your own search service API) which was served off Bing as its index, although Google has started paying more and more people to send their search traffic to Google.

Bing charges $7/thousand [1] for their "quality" searches and $3/thousand for their so-so searches (not as current, the index doesn't go as deep, this used to be what BOSS called until they turned it off in 2016).

That $7/thousand lets you give them up to 250 queries per second. For reference that is about 1-5M uniques per day. It looks like 21M searches a day but for English most of the searches come during the day from Europe and The US so you're really only going to do 10 - 15M searches per day at that rate. If you are clever you can cache results so for the same search you can just re-use the cache rather than paying for another result. This is nominally frowned upon but hard to defend against. If you manage to make a deal with a phone supplier to be the 'standard' search engine a lot of queries will just be 'facebook' or 'reddit' so you don't really need to actually query those. You will want to find some ad networks to provide you ads. Bing will do that too, but you will quickly figure out that if you could make money reselling Bing results with Bing ads, that they could do that too so you've find the margins pretty thin and negative at times. You'll have to pay for a machine that is taking those queries, calling out to what ever ad networks you want, and then filling out a results page (SERP) and sending it back to the consumer. If you are just fronting Bing or Yandex that is pretty straight forward to do with an nginx server on an AWS "large" instance.

If you negotiate well and market well you can be a dogpile or a startpage with some schtick that makes you different than just going to Google or Bing. The more privacy you afford the clients the more margin you give up (because you can't sell that information as well).

Bottom line is that its a hard way to make a living.

[1] https://azure.microsoft.com/en-us/pricing/details/cognitive-...

wildpeaks · on July 3, 2018

I wish Google would let us use both Verbatim (use all keywords I entered as-is instead of assuming you know better than I what I meant to be searching instead) and filter by date (to get most recent results first) because right now, you have to choose between relevant but outdated results, or irrelevant but recent results, both being frustrating.

known · on July 2, 2018

The world’s most valuable resource is no longer oil, but data https://www.economist.com/leaders/2017/05/06/the-worlds-most...

chrisseaton · on July 2, 2018

What on earth is this title supposed to mean?

Arnavion · on July 2, 2018

I think the title is supposed to be "metasearch engines", since that's what the article is about.

dang · on July 2, 2018

Ok, we'll use that. Thanks.

tgb · on July 2, 2018

Seems like that's the wrong title though: the article is showing that search engines (other than Google, Bing, Yahoo) are still a thing. Maybe "alternative search engines are still a thing"?

spc476 · on July 2, 2018

Author here. Back in the mid 90s, search engines were a thing, and you had many companies trying to provide search results in the emerging web. In 1996-97, it was the in thing to run a search engine.

And then Google happened ...

dang · on July 2, 2018

OK, we'll change the title back.

chrisseaton · on July 2, 2018

I still don't get the title. Search engines were a thing - yes everyone knows that so that bit doesn't say anything - and they're still a thing - well ok we all already knew that as well - and if they're a still a thing why did you say they were a thing just before? It's two useless statements, and one is redundant because of the other! And what does 'a thing' really mean? That they exist? Why say 'a thing'? It must be the title with the least information possible.

NeedMoreTea · on July 2, 2018

Talk of metasearch engines and not mention Dogpile? They have been going since '95 and they're still there. Doubt they get much traffic these days.

jmts · on July 2, 2018

Dogpile is listed (though not by name) under the Webcrawler entry.