TinEye and Google Goggles applied to the entire Internet? Yes please. I can't wait to play around with this.
What will be even more interesting is if they release an API for it in the future. Sites like imgur and reddit could then suggest if you're uploading or submitting a similar image to one that already exists.
I just tried TinEye (hadn't heard of it before) and it appears to only look for images that are modifications of the image being searched for.
I wonder if Google's will work the same, or if it will be possible to find other images that are similar, but not based on the same image? That's what I really want.
The "similarity" of two images is a very complicated concept, when people are asked to tag images the overlap rarely goes above 20% on average (e.g. check out http://images.google.com/imagelabeler/ and try your hand at it). Think about it: you may think at the object level (both images have cars), concept level (both are happy images), color, etc. This is why large online stock image sellers still rely on tags extensively.
As a rough analogy, consider a textual example: Find a sentence similar to "It is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife." Now, if you enter this sentence in Google, it retrieves documents that contain it, in TinyEye fashion. What other similarity is desired? Should it retrieve essays on Austen, on marriage, 17th century English literature...?
If you are interested in image similarity search, check out the Pascal challenge (http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2011/inde...). The advances in the last 5-6 years on object detection and visual feature extraction (which image similarity relies on) is amazing.
Actually the presentation showed a query for an old picture returning images taken from the same place.
It's being said that they're doing it by matching images' basic lines and shapes.
An API would be amazing. Could make a service that tells you if any of your Flickr/dA/Picasa photos are being used anywhere else on the web, etc. I'd pay good money for that service (as a photographer who has had photos stolen/used-without-permission).
I use TinEye a lot and I think this will be great. My typical uses of TinEye, that will probably be improved with the bigger stock:
- somebody put text over a nice image and I want the unmodified version
- searching for bigger, better quality versions of an image (e.g. wallpapers)
- finding other images from the same author/gallery (since it links to the sites that hosts the copies)
- finding the name of the movie, person or object pictured (because copies will be hosted with different, probably meaningful names and in pages with subtitles)
From my understanding, TinEye makes money from corporate B2B deals. Their consumer product is mostly just a technology demo/marketing piece rather than a part of their core business.
Though I do sometimes use these services, since a few years I block these spiders on my own sites and those of clients.
Often this would happen: Client or user finds an image through Google Image Search on another blog or website, with unclear copyright. Then the client or user would upload this to the server. Then the client, or me, would receive a letter from a (GettyImages) lawyer: If we would please pay for the full licensing right of that 160x160 pixel image.
Thinking about it, I find accommodating to these services can lead to nothing but trouble: Either legal trouble, or hit-and-run users stealing your images, because you paid for higher resolution.
I hope I can separately block this Google service from Google Image Search. Although Google Image Search isn't as good to webmasters as it used to be (especially for those that rely on advertisement clicks) and the users it can send can be negligible: Ranking higher in Google Image Search seems to correlate to ranking higher in Google Web Search.
If it is part and parcel of Google Image Search, I might reconsider my robots.txt directive for Googlebot-Image. They just now opened this up for the public, but it is likely they are already using this internally to gauge (media) quality factors on-page.
P.S.: It would be interesting to see what happens when Google doesn't partner up with GettyImages or iStockPhoto, like in the early days of TinEye you could abuse that service to find the same stock images without watermarks, on the sites of people that already paid for that image.
P.P.S.: Now you can add RDFa or Microdata to mark up your images with a copyright statement, what would happen to sites that host copyrighted images, tagged "not for reproduction"? Google should be able to find the canonical image and "punish" those that don't comply with its copyright.
Maybe what Schmidt said about Google's intentions was true, but another reason they haven't done it is because it's not possible right now, or at least not in a fully-automated way across most images on the web. The state-of-the-art on the easier "verification" problem ("are these two images of the same person?") are shown here:
http://vis-www.cs.umass.edu/lfw/results.html
The best results are under 90% accuracy, which sounds pretty good, until you realize that random chance is 50%, and for recognition ("who is this person?"), you're essentially exponentiating that 90% by the number of different people you want to recognize.
This reminds me of apps like Shazam. While they definitely serve an amazing purpose (recognizing songs by finding a similar region of sound) what I would really like is an app that could recognize my humming a song--which probably sounds nothing like the actual song itself (different key, speed, entirely different voice, etc.)
The article said the technology is very similar to Google Goggles. From what I could deduce, Goggles uses some kind of local invariant descriptors (like SIFT or SURF), and TinEye uses some kind of global descriptor (maybe something like this: http://www.hackerfactor.com/blog/index.php?/archives/432-Loo... ).
If that is right, Google will be able to retrieve different photos in which the same object appears, whereas TinEye only retrieves the same image, with or without some changes. So, they're quite different beasts.
I compared/contrasted TinEye against Google CBIR for a bunch of images (I use TinEye a lot) and I have to say TinEye looks better than Google CBIR so far. TinEye has less "zero results" and more search results overall, I get the feeling TinEye deals with "Photoshopped images" better somehow.
What will be even more interesting is if they release an API for it in the future. Sites like imgur and reddit could then suggest if you're uploading or submitting a similar image to one that already exists.