Humans proved that for current level driving two not particularly high-resolution cameras are sufficient. Seems like pushing in this direction will remove this expensive component?
Humans, also posses a listening system, balance system, a highly advance pattern recognition system filled with auto complete from a huge database of pictures (which to this date hasn't been replicated - face recognition doesn't count it needs to recognize cars, signs, people, animals, pavement, trees, obstacles, etc.), not to mention knowledge of various possible scenarios, various models of how their body/car/traffic works, etc.
You get to use worse hardware, but you need several order magnitude better software.
> a highly advance pattern recognition system filled with auto complete from a huge database of pictures (which to this date hasn't been replicated - face recognition doesn't count it needs to recognize cars, signs, people, animals, pavement, trees, obstacles, etc.)
I expect that, after the first wave of clumsy LIDARing self-driving cars, all the car companies (Google especially) will be collecting exactly training data from the cars' sensors to build exactly this kind of model. In fact, I wouldn't be surprised if that was what the Google car was really about, in the same way Google Voice is really about collecting speech training data.
The best part of this kind of training data is that it all comes pre-annotated with appropriate reinforcements: even if the image-recognition sensors aren't hooked up to anything, they're coupled to the input stream from the other car sensors and the driver's actions. So you would get training data like
- "saw [image of stopsign], other heuristically-programmed system decided car should stop, driver confirmed stop."
- "saw [image of kitten standing in the road], other heuristically-programmed system decided car should continue, driver overrode and stopped car."
Etc. Aggregating all these reports from many self-driving cars, you could build an excellent image-to-appropriate-reaction classifier.
Yes, but with voice data it's ok if the system gets it wrong occasionally. Worst-case scenario is the user gets annoyed and tries again (or gives up and does something else).
In a driving situation, the worst-case scenario is everybody dies.
I would guess that the processing power is all that matters. It's not difficult or particularly dangerous to drive without being able to hear. I would guess that people driving remote controlled cars with 360-degree views but no other cues would perform very nearly as well as real drivers.
The human eye can instantly recognize the available driving paths, the motorcyclist ahead, and project where people will walk. Software would have to parse out where the open roads are, how far that motorcyclist is and whether he can clear the intersection before the car reaches it, and what that sign on the right-hand side is—using the same information, but it has to parse it first whereas we do that almost instantly. It's a totally different game.
Yes, that's what I'm saying. I'm saying the other sensors the parent post mentioned weren't actually important with regards to driving, just our ability to parse the visual data into a meaningful model of the world around us.
I think hearing is also useful, from time to time. It's not AS critical as sight, but if nothing it allows drivers to share their emotional state in a very primitive way and to gauge how their engine is performing.
While driving, humans assume that the road ahead is OK (at least without significant potholes, and in sunny hot weather no ice-patches). We would expect better of robots (i.e. if a human crashes because of an oil patch, (s)he's a bad driver; if a computer crashes, it's a million-dollar lawsuit).
Edit: a better solution would be to observe the behaviour of other drivers; if there is someone driving ahead of you, you can assume that the road between you and them is OK; if there's noone ahead of you, you need to drive slower and be more careful (that's how I drive at night). Once there's a critical mass of cars with cameras, cars could communicate road conditions automatically.
> Once there's a critical mass of cars with cameras, cars could communicate road conditions automatically.
I would be frightened to trust the data coming from a random car in front of me. Inferring road condition from another car behavior sounds reasonable, using data supplied from it, not so much.
You shouldn't and wouldn't rely unfailingly on what other cars merely report. If the car in front of you insists its maintaining speed while your own readings indicate its slamming on its brakes, you should assume it's slamming on its brakes.
However, if the car three cars in front of you just broadcast "I'm doing an emergency stop right now", that's really valuable data. The human in your car won't know anything is wrong for at least a second. The human driver behind you would know about it before the human driver in front of you.
That will be the most common failure mode for computer-driven cars: how easy they are to bring to a stop. (And, yes, that's probably criminal behavior.)
A computer-driven car, though, wouldn't (shouldn't) just immediately slam on the brakes because of that signal. It would tighten seat belts and start slowing down, but it also would want to avoid getting rammed by the car behind it. It can make very accurate estimates about its stopping distance and use all of it.
The internet works that way, and you seem to be fine with that. :\ Is it the "saftey issue"? i.e the internet can't crash you into a wall, it can only send you to rotton or steal your credit card...
Wow, no. I don't trust the internet to give me real facts about elephants, let alone anything life-threatening. http://en.wikipedia.org/wiki/Wikipedia:Wikiality_and_Other_T... If the equipment had some built-in tamper detection, and Google's sensors digitally signed data if they didn't detect tampering, then I might trust it enough to drive with.
"Seeing" and "Perceiving" are likely very different. Yes, we only have binocular visual input, but the excess in processing in the brain takes perception to another level. However, machines have trouble with the perception part and so have to make up for it by seeing in excess.
The simple answer is that we don't know how to do it with stereo vision alone yet. Getting range reliably is hard, and your brain uses lots of tricks to do it.
The second answer is that we need better-than-human performance if this is to take off. So using human-type sensing might not be good enough anyway.
Lots of researchers are pushing on vision-based driving, though.
Human eyes are actually equivalent to very high end video cameras, and the image process that you can do in your squishy grey 10 watt processor is still way better than anything we can do with computers. You need your navigation system to be able to directly sense in 3 dimensions for it to be competitive.
Not really: we have hi dpi (and in focus) resolution only in the narrow field of view in the middle, everything else is not that good. We compensate for this by having ability to quickly move eyes and refocus.
Well, it's already in consumer-level (ok, "prosumer") cameras. Also we know that "fps" is of an eye is around 60 Hz - since it's the minimum that monitors looks OK.
When you can fit an exaflop electronic computer into a car, then maybe two cameras would be sufficient. Right now, we have to make do with less, and better sensors can make up the difference.
That's a rough estimate of how much computing power is in the human brain. It's extremely efficient energy-wise, but massively parallel and weirdly put together so not entirely comparable to an electronic computer. Still, the computing resources available to process the images from the human eye are enormous.
Humans proved that for current level driving two not particularly high-resolution cameras are sufficient. Seems like pushing in this direction will remove this expensive component?