Damn innovative.
https://en.wikipedia.org/wiki/Acoustic_mirror
https://www.cnn.com/style/article/war-sound-locators-before-...
Like getting some MANPADS teams in the route of an oncoming helicopter assault.
You can use ESP32 with GPS modules and their PPS signals. The PPS signal from the module often has has a roughly precision around 60ns against the global GPS standard.
With that signal you can PID-control an internal timer of the ESP32 - which then can be used to timestamp audio frames. Send that to a central host over Wifi and you can use your standard localization math.
The trick is to use the internal ESP32 10MHz hardware which automatically kicks timestamps into a register if a GPIO does something. Not using high-level C constructs that must eat their way through x API layers.
This costs like 20€.
I bet modern radar can tell the difference between a bird, plane, baseball, and missile, but a camera based one is full of false positives.
Also, modern radar can't always tell the difference between a bird and a plane. Especially when dealing with stealth vehicles.