7 pointsby xmhatx2 hours ago5 comments
  • cwooleyan hour ago
    Interesting methodology. How much of this translates to the newer speech-to-speech models (like GPT-4o realtime) where there's no separate STT step? Seems like Phase 1 (Transcription Analysis) becomes less relevant when the model is processing audio natively. Does that make injection harder or just different?
    • xmhatxan hour ago
      Great question! It makes it more interesting! New attack angles are presented when dealing with the speech-to-speech models. Prosody, which are the intonation patterns that convey meaning, emotion, and emphasis beyond the literal words, comes into play! We have observed soft-spoken, gentle, and unsure requests often outperform authoritative statements in these systems. They also introduce potential attack surface such as background noises or phrases spoken as asides (like speaking to another person in the room) can impact the models understanding. This documentation started from testing a speech-to-speech model. You bring up an excellent point though. We will need to go back and re-frame this documentation to highlight the differences between testing TTS vs STS systems with some pointers on how to detect which type of system you are interacting with. Thanks for the question!
  • Blarcher31an hour ago
    The system prompt hardening guide on their docs site is worth reading too (/docs/guides/system-prompt-hardening). The recommendation to put security rules last in the system prompt because of recency bias is counterintuitive but makes sense.
    • xmhatx21 minutes ago
      Definitely agree about it being counter-intuitive. The recency bias is very real! We have learned that prompt engineering can be quite nuanced! The other important item we have learned for prompts is delimiting into clear sections to give the model better contextualization of the instructions and information.
  • primeobsession2 hours ago
    Very cool! Voice AI feels like the frontier of the frontier and isn't getting the attention needed.
    • xmhatxan hour ago
      We were surprised by this, as well! We ended up making our own tooling to test a speech-to-speech system because of this gap. Voice AI is becoming more and more prevalent with real security implications. ElevenLabs just started offering insurance specific to Voice AI agents for this very reason. This was very, very recent news (Feb 12, 2026). We wrote an article about this earlier this week. https://www.securecoders.com/blog/voice-ai-insurance-aiuc1-c...
  • iamblakean hour ago
    Nifty!
    • xmhatxan hour ago
      Nifty and schwifty, ftw!
  • soul_hackz2 hours ago
    Nice. Seems intriguing.