It currently takes pictures every 30 seconds and whenever you switch applications.
I use https://huggingface.co/mlx-community/gemma-3-4b-it-qat-4bit to do the chat/image recognition and Qwen/Qwen3-Embedding-0.6B-4bit and Qwen3-Reranker-0.6B-4bit to help in search related features.
For voice I use Apple's SFSpeechRecognizer. I'm thinking of switching that to an OS model, but the memory footprint of the application is already very high.