Tesla’s Andrej Karpathy Gives A Keynote At CVPO 2021 Workshop On Autonomous Driving
The Conference on Computer Vision and Pattern Recognition (CVPR) held its 2021 Workshop on Autonomous Driving (WAD) virtually and Tesla’s Senior Director of AI, Andrej Karpathy, was one of many keynote speakers. The video is a bit over 8 hours, but you can listen to Karpathy’s part starting right here.
In the video, Karpathy talks about some of the things Tesla has been up to for the last few months. He dives in with a few slides that demonstrate the importance of Tesla’s work in autonomy. Karpathy noted that we are in “bad shape when it comes to transportation in society,” referring to the fact that these vehicles made from metal are traveling pretty quickly with high kinetic energy while being controlled by humans — or, as he described it, being controlled by meat.
The first slide, “Meat computer use in today’s transportation,” gives several quick facts:
- In tight control loop with 1ton objects at 80 mph.
- 250ms reaction latency.
- Has to turn head, use mirrors for situational awareness.
- Keeps checking Instagram.
- Every day less than 3,700 people are killed globally in automotive accidents.
- Capable of poetry but solving the line following the problem.
- Cost of transportation is high.
“And really fundamentally what it comes down to is people are really not good at driving. They get into a lot of trouble — in accidents. They don’t want to drive and also in terms of economics, we are involving people in transportation. And of course, we’d like to automate transportation and really reap the benefits of that automation as a society.”
He noted that it should be possible to replace a meat computer with a silicon computer while getting a lot of benefits out of it in terms of safety and convenience. Some of those benefits are presented in the slide he shared on this. They are as follows:
- In tight control loop with 1ton objects at 80 mph.
- <<100ms reaction latency.
- 360-degree awareness.
- Fully attentive.
- Deaths resulting from accidents are dramatically reduced.
- Solving the line following the problem–alleviated
- Cost of transportation substantially decreases.
“Silicon computers have significantly lower latencies, they have 360 situational awareness, they are fully attentive and never check their Instagram and alleviate all of the issues that I presented.”
Karpathy pulled up a frame from iRobot and noted that this feature is something that many have looked forward to. In this particular scene, Will Smith was about to drive a car manually and the other person was shocked that he was using his hands to drive.
“I think this is not very far from truth and this movie is taking place in 2035. So I think by then — actually think this is a pretty prescient prediction.”
What Is Unique About Tesla And Its Approach To Autonomy?
Karpathy explained what he thought was unique about Tesla and the company’s approach to autonomy.
“We take a very incremental approach towards this problem. So, in particular, we already have customers with the Autopilot package and we have millions of cars and the Autopilot software is always running and providing active safety features and, of course, the Autopilot functionality. So we provide additional safety and convenience to our customers today, and also the team is working on Full Self-Driving capability.”
Karpathy also showed some of the value today that Tesla is providing. In the call, he showed a video demonstrating automatic emergency braking. The driver was going through an intersection when a pedestrian kind of showed up out of the blue. The car saw the pedestrian and object detection kicked in, slamming on the brakes to avoid a collision.
The next demonstration was an example of a traffic control warning which showed that the driver was distracted. They were probably checking their Instagram. Not only were they distracted, but they didn’t brake for the traffic lights ahead. The car sees that the traffic lights are red, so it beeps and the driver starts slowing.
The next two videos are examples of pedal misapplication mitigation (PMM). In the first example, the driver is parking and trying to turn. However, they make a mistake and floor the accelerator instead of braking. The system kicks in, sees the pedestrians, and slams on the brakes.
The final scenario of PPM Karpathy showed was another driver trying to park. The driver turned to the right and thought they were pressing the brake. Instead, they floored the accelerator and the system kicked in and prevented the driver from going into a body of water.
Karpathy also showed another video of a Tesla navigating around San Francisco autonomously and noted that he was showing all of the predictions — you can see some lines and objects indicating what the system was seeing.
“Now, we of course drive this extensively as engineers, and so it’s actually fairly routine for us to have zero-intervention drives — I would say in, like, sparsely populated areas.”
One thing he always points out when someone sees a Tesla driving around autonomously is that you’ve seen this before and have been seeing it for over a decade. But that’s an oversimplification.
Tesla’s Approach Vs. The Competitors’ Approach
“So here’s a Waymo taking a left at an intersection, and this is actually a pretty old video, I believe. So you’ve been seeing stuff like this for a very long time. So what is the big deal? Why is this impressive? And so on. I think the important thing that I always like to stress is that even though these two scenarios look the same — there’s a car taking a left at an intersection — under the hood and in terms of scalability of the system, things are incredibly different.
“So, in particular, a lot of the competing approaches in the industry take this LiDAR-plus-HD-map approach.”
He explained that the competitors have to pre-map the environment with the LiDAR sensors as well as create a high-definition map. They also have to insert all of the lanes, the traffic lights, and at the time of the test, the vehicle is simply localizing to that map in order to drive around.
“The approach we take is vision-based primarily, so everything that happens, happens for the first time in the car based on the videos from the 8 cameras that surround the car. And so we come to an intersection for the very first time and we have to figure out where are the lanes, how do they connect, where are the traffic lights, which ones are relevant, what traffic lights control what lanes — everything’s happening at that time on that car and we don’t have too much high definition.”
Karpathy explained that this is a more significantly scalable approach.
“It would be incredibly expensive to keep this infrastructure up to date, and so we take the vision-based approach, which of course is much more difficult because we actually have to get neural networks that function incredibly well based on the videos. But once you get that to work, it’s a general vision system, and in principle can be deployed anywhere on earth. So that’s really the problem we are solving.”
Tesla’s Vision System
Karpathy explained that the vision system Tesla has been building over the last few years is so good that it’s leaving the other sensors in the dust. The cameras are doing most of the work in terms of the perception and now it’s gotten to where Tesla is removing some of the sensors because they are becoming unnecessary crutches.
“Three weeks ago, we started to ship cars that have no radar at all. We’ve deleted the radar and we are driving on vision alone in these cars. And the reason we are doing this, I think, is well expressed by Elon in this tweet. He’s saying, ‘When radar and vision disagree, which one do you believe? Vision has much more precision, so better to double down on vision than do sensor fusion.’
“And what he’s referring to is basically like vision is getting to the point where the sensor is like 100× better than, say, radar. Then, if you have a sensor that is dominating the other sensor and is so much better, then the other sensor is actually starting to, like, really contribute — it’s actually holding it back and is really starting to contribute noise — the former system.
“And so, we are really doubling down on the vision-only approach.”
You can watch Karpathy’s full presentation here.