2 pointsby dorianzheng4 hours ago3 comments
  • longtermop4 hours ago
    The microservices framing resonates but surfaces an interesting security question. In your orchestration example:

      research = await research_agent.call("Find Q3 earnings...")
      analysis = await doc_agent.call(f"Analyze this data: {research}")
    
    When one agent's output flows directly into another's input, you've created an implicit trust boundary. What happens if the research skill fetches data from a compromised source that includes adversarial instructions? The doc_agent receives {research} as trusted input but it's actually attacker-controlled content.

    Skills that touch external systems (web scrapers, API clients, document parsers) become injection surfaces. This is analogous to the microservices problem of validating input at service boundaries, but harder because the "input" here is natural language that gets interpreted, not just parsed.

    Curious how boxlite handles sanitization between skill invocations. Is there a recommended pattern for treating inter-agent data as untrusted, or does the micro-VM isolation handle this by containing blast radius rather than preventing injection?

    (Working on related problems at Aeris PromptShield - this is genuinely one of the trickier aspects of composable agent architectures.)

  • dorianzheng4 hours ago
    I've been thinking about AI agents wrong. We keep treating them as this special thing that needs special infrastructure. But what if agent skills are just... functions you compose?

    # Specify what skills the agent has - like importing modules async with boxlite.SkillBox(skills=["anthropics/skills"]) as agent:

        # Skills loaded: pdf, docx, pptx, xlsx handling
        await agent.call("Read quarterly_report.pdf and summarize key metrics")
        await agent.call("Export that to a PowerPoint with charts")  # remembers context
    
    That's SkillBox. You declare what skills your agent has upfront. Then call it like any other async function. State persists between calls.

    The same skill works across Claude Code, OpenAI Codex CLI, and Gemini CLI. It's becoming the package manager for agent capabilities.

    # Install skills dynamically - like pip install at runtime await agent.install_skill("anthropics/skills") # PDF, Word, Excel, PowerPoint await agent.install_skill("example/web-scraper") # Custom scraping skill await agent.install_skill("your-company/internal") # Your private skills

    Orchestrating multiple agents:

    async with boxlite.SkillBox(skills=["anthropics/skills"]) as doc_agent:

    async with boxlite.SkillBox(skills=["example/web-research"]) as research_agent:

        research = await research_agent.call("Find Q3 earnings for top 5 tech companies")
    
        analysis = await doc_agent.call(f"Analyze this data: {research}")
    
    Just async Python. No YAML. No special DSL.

    The fun demo - Starbucks: One function call, Claude navigates the website, handles popups, adds coffee to cart. You watch via noVNC at localhost:3000.

    Yeah, it runs in a micro-VM (hardware isolation). But that's a feature, not the point. The point is the programming model.

  • BinaryAcid2 hours ago
    I like this. We need a Python alternative to “npx skills”.