Learning that sort of context is what neural nets are good at. The whole point is that none of this is hardcoded; the AI just learns what humans do in similar situations. At the point they're not even hand-labeling anything. The only input is video.
Flashing headlights can be ambiguous for humans too. There have been plenty of times when someone flashed lights at me and I had no idea why.
Flashing headlights can be ambiguous for humans too. There have been plenty of times when someone flashed lights at me and I had no idea why.