The camera processing first must recognize the thing ahead is a vehicle, then, it uses its resolution to decide if the image is taking up more pixels (getting closer) or less (getting further away). So, a lot of that capability is based on the resolution of the camera as to whether the image is shrinking or expanding in its field of view. Throw in the processing required to analyze that, and to be safe, the gap must account for the processing delay.
With radar, it's constantly 'pinging' the area ahead, and determining the actual range is fairly easy and quick compared to the sample rate of the video imaging.
The vehicles trying to do real self-driving are using a combination of lasers and all sorts of other sensors to produce a near 3-D image, then process what it thinks it sees and then evaluate the threats or necessary actions to accomplish the goal of staying on the road, staying in lane, and not hitting anything while detecting a stop sign, cross-walk, or traffic light, and obviously other traffic that might impact its chosen path of operation.
Doing all of that in real-time when things can happen quickly is a real challenge at an affordable and reliable manner. We'll get there, eventually. We're a long ways from it right now. Prototypes don't necessarily represent marketable products people can afford and depend on. Processors and sensors get better and cheaper. The human brain, when applied, is a formidable thing when paying attention to the task...it's just that it's hard on a boring task to maintain that attention. The problem with today's aids is that first, people don't understand their limitations (or ignore them!), and expecting them capable of doing more and then finding themselves in danger when the thing doesn't do what a human would. Stories of people watching movies, reading a book, tending to their dog in the back seat, or snoozing, are just an accident waiting to happen. Each iteration gets better.