I love how bad robots are at picking stuff up. It's amazing how far science has advanced but the act of picking something up that hasn't been seen before is almost impossible to do reliably. It really helps illustrate how AI is really not as powerful as people think it is.
If you think you've got a really good solution for robotics grasping you shouldn't be competing in the Amazon Picking Challenge given the IP terms. We (Righthand Robotics) think we've got a handle on the problem but didn't want to get involved for that reason. There are some other companies out there with interesting solutions who weren't involved either.
People are competing in the Amazon Picking Challenge too! But the picking prize is just $50,000. If it were $1,000,000 like the Netflix prize I'm sure there would be more competition.
It's not even just picking something up, it's picking one specific item out of an arbitrarily messy or overstuffed bin without dislodging or dropping anything else. Items in a bin can be wedged anywhere, in any position, in nondescript boxes only identifiable by the ASIN, or a vague natural language description. It's a hard visual logic, inference and manual dexterity problem made harder by unpredictability and harder still by the need to meet quotas, and every job has its own mess of them.
And on top of that, employees in Amazon warehouses are often cross-trained to save costs, which means you either need one robot that can unload, decant, move juice carts, pick, stow, pallete-stow, etc, at a moment's notice or else multiple dedicated robots for each task.
It's something humans do so easily that Amazon can justify paying almost nothing for the work, but still way beyond what AI and robotics can achieve. The Kiva pods seem to be the current state of the art for Amazon warehouses, and all they seem to do is move bins around (and can't even get that right sometimes.)
This is an important point. Grasping itself isn't that difficult if you know the object and its orientation. It's a pre-configured action (grip here and here, move, release). Ocado, a large online-only grocery store in the UK have a demo they show trade shows that uses a suction cup. They don't bother with a grasper, they just locate a big enough surface to latch onto and use that. That's considerably easier than picking things up using fingers and you can use fairly simple algorithms to detect planar surfaces.
The problems start when things are tightly packed. One object might be in the way of your gripper so you can't take the thing you want. What happens if you drop the object? Does your robot know how to move things aside to grab the item you want?
And on top of this you have the whole issue of detecting objects in a cluttered environment: this requires state of the art semantic segmentation, robust 3D vision that will work on non-cooperative targets and a classifier that can work with occlusions and huge numbers of objects reliably.
I think it's reasonable that some items can be scanned upon entry into the warehouse, e.g. on a laser triangulation turntable that also grabs a video as it rotates. With this sort of task you want to exploit as many priors as you can find. If you look at papers (and factories), you see a lot of tricks: objects on flat surfaces that make segmentation easier, small numbers of known objects, etc.
>I think it's reasonable that some items can be scanned upon entry into the warehouse, e.g. on a laser triangulation turntable that also grabs a video as it rotates
If it could be done quickly enough with enough volume, maybe, but the constraint is that items have to be loaded off the truck, scanned and stowed into a bin (which makes them available for purchase on the site) as quickly as possible.
You could have a conveyor system which would cover 5/6 directions without too much trouble, and it'd be fast enough for real-time capture. Or you could automate the most common purchases e.g. Amazon basics items, Kindles, electronics. They could even ask sellers to provide models or dimensions.
Probably not an issue for them, Ocado sell pretty much only food so it'd be for boxes/bags mostly. It's not been deployed yet though. I think they're also testing gripper robots for picking up larger items like bottle packs.
Also books, unless they're shrink wrapped, have a tendency to open which I imagine would be a pain.
It's incredible how versatile our hands are. Simple pressure, two jawed chuck, three jawed chuck, four jawed chuck, roundgrip, tweezer grip, scoop and all that in a tiny, totally quiet package with force feedback across the whole surface.
Mimicking that mechanically is a major challenge, it has nothing to do with AI per se.
It starts to click when you think about how many years it takes us to form those basic skills, refine them, and then the years spent practicing specific applications. All of this still requires dedicated trainers in the form of parents, driven by love and commitment.
Not even just on an individual basis. If you think about how long it took for dexterous hands to develop on an evolutionary basis, it took tens/hundreds of millions of years or even billions of years depending on where you start from.
Also, there is an interesting theory that humans hands not only evolved to grab things, human hands evolved to form "efficient" fists for punching.
Even if I restrict myself to two fingers clamping around something, I can do ridiculously better than the robot. You could even give me the actual robot's hand on a grip. The control logic seems to be the much bigger problem to me.
Force feedback provides humans constant information on how our hands are doing.
Reliable force feedback data at our hand's level of detail would provide a wealth of data to those making the control software and that better software could benefit from real time data. The two problems are strongly interconnected.
I totally agree. (I only disagreed with the tiniest detail of his point and meant only to rebut that).
I would like to point that with pair of tongs you will still have force feedback. Even though is will be reduced in precision it will still be better than any artificial system I am aware.
When we are small we barely do any better than current gen robots.
Thing is that each of us accumulate prior data that we can use as a basis for future action, things like how much pressure is enough for grasping a raw egg without breaking it.
For what it's worth I for example only learned to tie my shoes when I was already 9 years old, while I had learned to read almost all by myself at 5 (my dad had taught me the letters of the alphabet).
Not only that, but it's difficult to mathematically model deformable objects in ways that are computationally tractable (e.g. running on a 10ms update loop)
Actually there are a couple of companies and labs who are close to (if not already solved) the problem, but no sane person is going to compete in the Amazon Picking Competition if they've actually solved it.
Yeah. You (I assume, unless you are a dog on the internet) and I (definitely not a dog) are the result of something on the order of billions of iterations of the "grasping hand" problem.