Using a visual language model to generate a description for personalized audio deterrence
If we have a Jetson in a surveillance trailer running a VLM, it would sure be nice to be able to send a personalized textual message to the talk-down through the local Chekt API, i.e. "Hey you in the blue hat and white shirt, move along!". This sort of personalized deterrence is much more effective than a generic message. You already use Google TTS so this shouldn't be too bad to add. This is also a lot better than us having to use the ONVIF back channel to send it ourselves. This would also allow integrators to send any generic message to the IP Horn, lots of interesting use cases here.