The declarative style keeps the workflow detail at a high enough level to iterate super quick - love that. More important to me is that it’s structured and seems like it would be more testable (I see validation in your docs).
Zooming in to the pipe/agent steps I can’t quite see if you can leverage MCP as a client and make tool calls? Can you confirm? If not what’s your solution for working with APIs in the middle of your pipeline?
Also a quick question, declarative workflows won’t solve the fact that LLMs output is always non deterministic, and so we can’t always be guaranteed the output from prior steps will be correct. What tools or techniques are you using/recommending to measure the reliability of the output from prior steps? I’m thinking of how might you measure at a step level to help you prioritise which prompts need refinements or optimisations? Is this a problem you expect to own in Pipex or one to be solved elsewhere?
Great job guys, your approach looks like the right way to solve this problem and add some reliability to this space. Thanks for sharing!
Many companies are working on evals and we will have a strategy for integration with Pipelex. What we have already is modularity: you can test each pipe separately or test a whole workflow, that is pretty convenient. Better yet, we have the "conceptual" level of abstraction: the code is the documentation. So you don't need any additional work to explain to an eval system what we were expecting at each workflow step: it's already written into it. We even plan to have an option (typically for debug mode) that checks every input and every output complies semantically with what was intended and expected.
Thanks a lot for your feedback! It's a lot of work so greatly appreciated.
I'm wondering if this kind of scenario (essentially it's just the input document is too big) is possible to be handled in Pipelex. In my understanding DSL is good for it's high-level abstraction and easy to understand, but lacks flexibility and the power is restricted. How can the users of Pipelex iterate the pipelines to fulfill the complex need when the business logic became complex inevitably?
Regarding large docs, I know what you mean and we've been there: at one point we considered focusing on that feature, build a kind of agentic system to master large docs. At the time in 2023-2024, everyone was relying on vector store RAG and embeddings to solve that problem but we always thought that solution wouldn't bring enough reliability. We wanted the LLM to actually read the whole docs, in order to know what's in there and where. The idea was to read and "take notes" according to different aspects of what it's reading, and synthesize hierarchically from bottom to top. So, that part of the work could be done once and the structured "notes" could be exploited for each use case, pretty efficiently thanks to the zoom-in/out provided by the architecture. We went pretty far in building that solution, which we called "Pyramid RAG" internally, and the aim was to build AI workflows of top of that. But at some point, the building of the pyramid became pretty complex, and we realized we needed AI workflows to build the pyramid itself, and we needed to to do it right. That's when we decided to focus on what would become Pipelex. Now, in our new setting, solving the "large docs" problem is a large problem we can break down into a collection of Pipelex workflows. So it's part of our plan to provide those building blocks, and we hope to involve the community as much as possible, to cover all the possible applications.
I think a solution to this problem as a runnable example would be a nice showcase of what is achievable with pipelex.
As an example, I have a use case where I'm processing millions of documents in a fan-out/fan-in pipeline that uses multiple LLMs to analyze and condense information. This mostly isn't CPU-bound, but it does consume a lot of RAM.
I'm currently using Ray to split the workload across a cluster. Ray has autoscaling, so it's very good for this kind of thing.
Our integration with temporal.io has been working as a prototype in our lab for many months but we had to focus on the core features to nail down our language and make it as easy to use as possible. Now in the coming weeks we'll be able to focus more on the temporal-powered version, adding the missing features to make it practical, which will also make it usable with other workflow platforms.
Positioning:
• Pipelex is an AI workflow orchestrator. You declare multi-step pipelines (.plx) with LLM calls, OCR/PDF extraction, image generation, custom Python/tool steps, conditionals, and parallelism, then run them via CLI or Python, FastAPI/Docker, MCP, or n8n node.
• BAML defines typed, single-call “AI functions” (primarily LLM interactions) and compiles them into SDKs you call from your app; you typically orchestrate multi-step flows and any non-LLM work in your own code.
Developer experience:
• Pipelex feels like writing a readable pipeline spec you can iterate on or auto-generate and then execute end-to-end. Non-LLM ops (OCR, PDF, image gen, API calls) are first-class pipes you compose with LLM steps.
• BAML feels like writing typed prompt functions with templates and getting a clean, generated client API. Non-LLM work (OCR, PDF parsing, external APIs) usually sits in your app code or tools around those generated functions.
Structured output:
• Not exclusive to BAML. Pipelex’s PipeLLM supports strongly typed structured generation (based on the instructor package). Define schemas directly in .plx (simple concept syntax) or in Python with Pydantic v2, including custom validators and arbitrary validation logic.
Where each fits:
• If you want to design and run full workflows (extract → analyze → generate, with branching/parallelism) and keep that logic declarative and shareable, Pipelex is a great fit.
• If you mainly need reliable, typed “AI functions” embedded in an existing codebase, Pipelex also works well (define structures in Pydantic and call steps from Python). BAML is likewise a good fit when you prefer generated SDKs and you’re comfortable orchestrating steps and non-LLM work in app code.
Why Pipelex’s higher abstraction matters:
• It captures business logic and domain know-how in a structured, unambiguous language. Instead of burying intent across scattered prompts and glue code, a .plx file centralizes the specification (data concepts, pre/post-conditions, step semantics). That makes workflows easier to review, version, audit, and hand off, by humans and by AI, while remaining extensible to new AI/software operations.
TL;DR:
• Pipelex = declarative workflow engine/standard for multi-step AI pipelines, with first-class non-LLM ops and strongly typed outputs.
• BAML = typed prompt/function language that generates SDKs; you assemble and extend around it in your app.
Waiting for partnership to propose to our users
Currently, a failed step means you need to restart everything: we have started to add features that save intermediate states so you can recover from there but the recovery part is not finished.
Our main strategy is to make Pipelex easy to integrate with durable workflow orchestrators which are really good at solving that problem. For instance, we'll soon release a plugin integration with temporal.io.