70 pointsby abelanger7 days ago11 comments
  • movedx014 days ago
    This is great, and I keep my fingers crossed for Hatchet!

    One use case I imagine is key here is background/async agents, so OpenAI Codex/Jules style, so that's great if I can durably run them with Pickaxe(btw I belive I've read somewhere in temporal docs or some webinar that Codex was built on that ;), but how do I get that real-time and resumable message stream back to the client? The user might reload the page or return after 15 minutes, etc. I wasn't able to think of an elegant way to model this in a distributed system.

    • gabrielruttner4 days ago
      perfect use case and this was one of the reasons we built pickaxe, we have a number of coding agents/pr review platforms powered by hatchet with similar patterns already... more to come on the compute side for this use case soon

      we'll have agent->client streaming on the very short term roadmap (order of weeks), but haven't broadly rolled out since its not 100% ready for prime time.

      we do already have wait for event support for client->agent eventing [1] already in this release!

      [1] https://pickaxe.hatchet.run/patterns/human-in-the-loop

      • cmdtab4 days ago
        Pretty much was my first question if there was support for streaming events. Any way we could be beta tester? ༼ つ ◕_◕ ༽つ
        • gabrielruttner4 days ago
          100% shoot me a note at gabe [at] hatchet [dot] run and we can share some details here for the signatures that exist, but are going to change
  • gerardosuarez2 days ago
    It would be great to have a section in the README about how the code looks without using the library, contrasting it with the example you already have. I would need a significant time-saving reason to use a new external library. This is because, for new libraries like yours, we don't know how long you plan to support it. For that reason, using an external library for a core part of my business is a huge risk for me.

    My use case: cursor for open-source terminal-based coding agents.

  • jskalc924 days ago
    If I understand it correctly, tools and agents run() method works in a similar way to react hooks, correct?

    Depending on execution order, tool is either called or a cached value returned. That way local state can be replayed, and that's why "no side effects" rule is in place.

    I like it. Just, what's the recommended way to have a chat assistant agent with multiple tools? Message history would need to be passed to the very top-level agent.run call, isn't it?

    • gabrielruttner4 days ago
      yes, similar and we've been toying around with some feedback to have a `pickaxe.memo(()=>{})` utility to quickly wrap small chunks of code similar to `useMemo`.

      we'll be continuously improving docs on this project, but since pickaxe is built on hatchet it supports concurrency [1]. so for a chat usecase, you can pass the chat history to the top level agent but propagate cancelation for other message runs in the session to handle if the user sends a few messages in a row. we'll work an example in pattern section for this!

      [1] https://docs.hatchet.run/home/concurrency#cancel-in-progress

  • blixt4 days ago
    I see the API is rarely mentioning exact message structure (system prompt, assistant/user history, etc) or the choice of model (other than defaultLanguageModel). And it's not immediately clear to me how `toolbox.pickAndRun` can access any context from an ongoing agentic flow other than within the one prompt. But this is just from skimming the docs, maybe all of this is supported?

    The reason I ask is because I've had a lot of success using different models for different tasks, constructing the system prompt specifically for each task, and also choosing between the "default" long assistant/tool_call/user/(repeat) message history vs. constantly pruning it (bad for caching but sometimes good for performance). And it would be nice to know a library like this could allow experimentation of these strategies.

    • gabrielruttner4 days ago
      gabe, hatchet cofounder here. thanks for this feedback and i agree!

      under the hood we're using vercel ai sdk to make tool calls so this is easily extended [1]. this is the only "opinionated" api for calling llm apis which is "bundled" within the sdk and we were torn on how to expose it for this exact reason, but since its so common we decided to include it.

      some things we were thinking is overloading `defaultLanguageModel` with a map for different usecases, or allowing users to "eject" the tool picker to customize it as needed. i've opened a discussion [2] to track this.

      [1] https://github.com/hatchet-dev/pickaxe/blob/main/sdk/src/cli...

      [2] https://github.com/hatchet-dev/pickaxe/discussions/3

      • cmdtab4 days ago
        I think providing examples and sample code is better than tying your API to AI sdk.

        Due to how fast AI providers are iterating on their APIs, many features arrive weeks or months later to AI SDK (support for openai computer use is pending since forever for example).

        I like the current API where you can wait for an event. Similar to that, it would be great to have an API for streaming and receiving messages and everything else is handled by the person so they could use AI sdk and stream the end response manually.

      • blixt3 days ago
        Appreciate it, looking forward to see how Pickaxe evolves!
  • golergka4 days ago
    Fantastic. That's exactly what I wanted to make for a long time but never got around to, writing ad-hoc, lacking, overlapping stuff each time.
  • randomcatuser7 days ago
    Oh this is really cool! I was building out a bit of this with Restate this past week, but this seems really well put together :) will give it a try!
    • abelanger7 days ago
      Thanks! Would love to hear more about what type of agent you're building.

      We've heard pretty often that durable execution is difficult to wrap your head around, and we've also seen more of our users (including experienced engineers) relying on Cursor and Claude Code while building. So one of the experiments we've been running is ensuring that the agent code is durable when written by LLMs by using our MCP server so the agents can follow best practices while generating code: https://pickaxe.hatchet.run/development/developing-agents#pi...

      Our MCP server is super lightweight and basically just tells the LLM to read the docs here: https://pickaxe.hatchet.run/mcp/mcp-instructions.md (along with some tool calls for scaffolding)

      I have no idea if this is useful or not, but we were able to get Claude to generate complex agents which were written with durable execution best practices (no side effects or non-determinism between retries), which we viewed as a good sign.

  • j_rosenthal4 days ago
    The library name is confusing given https://pickaxe.co/, a nicely done low code platform for building/monetizing chatbots and agents that's been around for 2.5 years or so.

    (No connection to pickaxe.co other than using the platform)

  • almosthere7 days ago
    What I really like about it, is that this kind of project helps people learn what an agent is.
  • awaseem4 days ago
    Love to see more frameworks like this in the Typescript eco-system! How does this compare to Mastra: https://mastra.ai/
    • gabrielruttner4 days ago
      thanks! we think of mastra and other frameworks as "batteries included" for patterns like memory and reasoning. this is great for many but not all projects. i think mastra is doing a great job balancing some of this by simply wrapping vercel's ai sdk (we took some inspiration here in our tool picker and it is recommendation for llm calls).

      we're leaning away from being a framework in favor of being a library specifically because we're seeing teams looking to implement their own business logic for most core agentic capabilities where things like concurrency, fairness, or resource contention become problematic (think many agents reading 1000s of documents in parallel).

      unlike most frameworks we've been working on the orchestrator, hatchet, first for over a year and are basing these patterns on what we've seen our most successful companies already doing.

      put shortly - pickaxe brings orchestration and best practices, but you're free to implement to your requirements.

      [1] https://github.com/hatchet-dev/hatchet

  • muratsu4 days ago
    How does this compare to agent-kit by inngest?
    • abelanger3 days ago
      (we haven't looked too deeply into agent-kit, so this is based on my impression from reading the docs)

      At a high level, in Pickaxe agents are just functions that execute durably, where you write the function for their control loop - with agent-kit agents will execute in fully "autonomous" mode where they automatically pick the next tool. In our experience this isn't how agents should be architected (you generally want them to be more constrained than that, even somewhat autonomous agents).

      Also to compare Inngest vs Hatchet (the underlying execution engines) more directly:

      - Hatchet is built for stateful container-based runtimes like Kubernetes, Fly.io, Railway, etc. Inngest is a better choice if you're deploying your agent into a serverless environment like Vercel.

      - We've invested quite a bit more in self-hosting (https://docs.hatchet.run/self-hosting), open-source (MIT licenses) and benchmarking (https://docs.hatchet.run/self-hosting/benchmarking).

      Can also compare specific features if there's something you're curious about, though the feature sets are very overlapping.

  • zegl7 days ago
    As a long time Hatchet user, I understand why you’ve created this library, but it also disappoints me a little bit. I wish more engineering time was spent on making the core platform more stable and performant.
    • abelanger7 days ago
      Definitely understand the frustration, the difficulty of Hatchet being general-purpose is that being performant for every use-case can be tricky, particularly when combining many features (concurrency, rate limiting, priority queueing, retries with backoff, etc). We should be more transparent about which combinations of use-cases we're focused on optimizing.

      We spent a long time optimizing the single-task FIFO use-case, which is what we typically benchmark against. Performance for that pattern is i/o-bound at > 10k/s which is a good sign (just need better disks). So a pure durable-execution workload should run very performantly.

      We're focused on improving multi-task and concurrency use-cases now. Our benchmarking setup recently added support for those patterns. More on this soon!

      • revskill3 days ago
        Hatchet is not stable.