4 pointsby zdql5 hours ago1 comment

zdql5 hours ago
After the Thinking Machines announcement (https://thinkingmachines.ai/blog/interaction-models/) I've become interested in the ideas behind speech models, and the next "step-function" In Agent<>Human interaction.
Was able to spin this up over the weekend. Here's how it works: GPT-Realtime-2 is the frontend. It has a tool to start a (Claude Code/Codex) task, and a tool to check the progress of said task.
The agents run in an async worker while the Realtime loop can continue chatting. It becomes fairly straightforward to manage several agent runs at once, as if you are having a conversation.