2 pointsby KennyVan4 hours ago1 comment
  • KennyVan4 hours ago
    Hi HN, I'm the author.

    Been building a fairly complex UI-heavy application for myself recently. At some point I tried handing parts of it to an agent and went through the usual options: Playwright scripts, Chrome extensions, Computer Use screenshot loops. Every one of them ran into the same wall. The agent would fight the UI, miss a click, trigger the wrong dropdown, lose state on a re-render. "OK now find the button that opens the dialog that contains the form." Slow and unreliable.

    Eventually it clicked that the interface I was forcing the agent to use was the problem. The agent doesn't need to click a button, it needs to do the thing the button does. So I pulled the pattern out into a small protocol and TypeScript SDK, and open-sourced it last week.

    Shortest way to describe it: an API for AI agents, defined by the app developer.

    You instrument your app once with the SDK. You declare typed actions with a Zod-style builder (Standard Schema compatible, so Valibot, ArkType etc. also work). Any MCP-compatible agent (Claude Code, Cursor, Codex, Copilot, Cline) then calls those actions as tools, with your real state, in your real process, through a tiny local gateway.

    The whole pitch in one example. Imagine an app with a todo list and an agent asked to add five items. Without Tesseron, the agent clicks the add button, types the first todo, hits submit, clicks add again, types the second todo, submits, five round trips through a brittle UI. With Tesseron your app exposes one action:

        tesseron
          .action('addTodos')
          .input(z.object({ items: z.array(z.string()).min(1) }))
          .handler(({ items }) => {
            state.todos.push(...items.map(text => ({ id: newId(), text })));
            render();
          });
    
    The agent calls addTodos(['a', 'b', 'c', 'd', 'e']) in one shot. Real handler, real state, no scraping, no brittleness.

    It works for backend APIs (via @tesseron/server), frontend apps (@tesseron/web, React / Svelte / Vue / vanilla), and desktop apps (Electron, Tauri). Handlers receive a ctx arg with MCP primitives: ctx.confirm for yes/no, ctx.elicit for schema-validated forms, ctx.progress for streaming status, ctx.sample for calling the agent's LLM inline.

    Runtime is JSON-RPC 2.0 over WebSocket. A small Node binary (@tesseron/mcp) runs locally as an MCP stdio server and bridges to your running app over WS. Click-to-connect is a six-character claim-code handshake, so the gateway knows which app belongs to which agent session. Tools appear and disappear as apps come and go.

    It's a protocol, not just a TypeScript thing. The JS/TS SDKs are the reference implementation. Protocol spec is CC BY 4.0 so anyone can write a compatible client or server in any language. Python and Rust (for Tauri) are on my roadmap.

    Reference implementation license: BUSL-1.1. Free for in-app and self-hosted use, auto-converts to Apache-2.0 four years after each release. The only blocked case is offering Tesseron-as-a-service.

    Things I would love feedback on: the builder API (.action().input().handler()) ergonomics, which is most likely to change; the ctx surface, whether I missed a primitive or overshot; the protocol spec itself, since these are the first external eyes; and any transport mode I should add (HTTP streaming is already on the list).

    Docs: https://brainblend-ai.github.io/tesseron/ Protocol spec: https://brainblend-ai.github.io/tesseron/protocol/ Repo and 6 worked examples: https://github.com/BrainBlend-AI/tesseron

    Happy to answer anything.