The more tractable approach IMO is focusing on input validation. The primary attack vector for agentic AI isn't the model going rogue—it's prompt injection through tool outputs, RAG results, API responses, and external content. The model follows instructions; attackers craft instructions that look like legitimate data.
We're building something for this at Aeris (PromptShield)—lightweight guardrails that scan inputs before they reach the model. Think of it less as "watching the AI" and more like input sanitization in traditional security. You wouldn't let untrusted data hit your database without validation; same principle applies to LLM context windows.
Curious whether people think the "watcher" needs to be an AI at all, or if deterministic/rule-based scanning catches the majority of attack patterns?
Case in point is Moltbook: it went from being an idea to going viral in a matter of days, and now it could either become the ecosystem that powers the next wave of innovation or the textbook example of the risks of vibe coding.