Cost to Solve < Remaining LTV * Profit Margin
In other words, do the details matter? If the customer leaves because you don’t take a fraudulent $10 return, but he’s worth $1,000 in the long term, that’s dumb.You might think that such a user doesn’t exist. Then you’d be getting the details wrong again! Example: Should ISPs disconnect users for piracy? Should Apple close your iCloud sub for pirating Apple TV? Should Amazon lose accounts for rejecting returns? Etc etc.
A business that makes CS more details oriented is 200% the wrong solution.
There are a whole class of problems that do not require low-latency. But not having consistency makes them pretty useless.
Frameworks don’t solve that. You’ll probably need some sort of ground-truth injection at every sub-agent level. Ie: you just need data.
Totally agree with you. Unreliability is the thing that needs solving first.
Sounds like management to me.
How does gpt o1 solve this?
I've been using the general agent to build specialised sub-agents. Here's an example search agent beating perplexity: https://x.com/xundecidability/status/1835059091506450493
I'm failing to see the point in the example, unless the agents can do things on multiple threads. For example let's say we have Boss Agent.
I can ask Boss agent to organize a trip for five people to the Netherlands.
Boss agent can ask some basic questions, about where my Friends are traveling from, and what our budget is .
Then travel agent can go and look up how we each can get there, hotel agent can search for hotel prices, weather agent can make sure it's nice out, sightseeing agent can suggest things for us to do. And I guess correspondence agent can send out emails to my actual friends.
If this is multi-threaded, you could get a ton of work done much faster. But if it's all running on a single thread anyway, then couldn't boss agent just switch functionality after completing each job ?
The prompt was: <prompt> Research claude pricing with caching and then review a conversation history to calculate the cost. First, search online for pricing for anthropic api with and without caching enabled for all of the models: claude-3-haiku, claude-3-opus and claude-3.5-sonnet (sonnet 3.5). Create a json file with ALL the pricing data.
from the llm history db, fetch the response.response_json.usage for each result under conversation_id=01j7jzcbxzrspg7qz9h8xbq1ww llm_db=$(llm logs path) schema=$(sqlite3 $llm_db '.schema') example usage: { "input_tokens": 1086, "output_tokens": 1154, "cache_creation_input_tokens": 2364, "cache_read_input_tokens": 0 }
Calculate the actual costs of each prompt by using the usage object for each response based the actual token usage cached or not. Also calculate/simulate what it would have cost if the tokens where not cached. Create interactive graphs of different kinds to show the real cost of conversation, the cache usage, and a comparison to what it would have costed without caching.
Write to intermediary files along the way.
Ask me if anything is unclear. </prompt>
I just gave it your task and I'll share the results tomorrow (I'm off to bed).
Given that there is (a fairly standard) API to interact with LLMs, the next question is, what abstractions and primitives help easily build applications on top of these, while giving enough flexibility for complex use cases.
The features in Langroid have evolved in response to the requirements of various use-cases that arose while building applications for clients, or companies that have requested them.
o1 (and likely sonnet 3.5) made chain of through and other complex prompt engineering irrelevant.
Realtime API (and others that will soon follow) will made the best VTT > LLM > TTV irrelevant.
VLMs will likely make LLMs irrelevant. Who knows what Google has planned for Gemini 2.
The point is building these complex agents has been proven a waste of time over and over again until, at least until we see a plateau in models. It's much easier to swap in a single API call and modify one or two prompts than to rework a convoluted agentic approach. Especially when it's very clear that the same prompts can't be reused reliably between different models.
I suppose my comment is reserved more for the documentation than the actual models in the wild?
I do worry that LLM service providers won't do any better than rest API providers in versioning their backend. Even if we specify the model in the call to the API, it feels like it will silently be upgraded behind the scenes. There are so many parameters that could be adjusted to "improve" the experience for users even if the weights don't change.
I prefer to use open weight models when possible. But so many agentic frameworks, like this one (to be fair, I would not expect OpenAI to offer a framework that work local first), treat the local LLM experience as second class, at best.
Inference speed is being rapidly optimized, especially for edge devices.
> too expensive,
The half-life of OpenAI's API pricing is a couple of months. While the bleeding edge model is always costly, the cost of API's are becoming rapidly available to the public.
> and too unreliable
Out of the 3 points raised, this is probably the most up in the air. Personally I chalk this up to sideeffects of OpenAI's rapid growth over the last few years. I think this gets solved, especially once price and latency have been figured out.
IMO, the biggest unknown here isn't a technical one, but rather a business one- I don't think it's certain that products built on multi-agent architectures will be addressing a need for end users. Most of the talk I see in this space are by people excited by building with LLM's, not by people who are asking to pay for these products.
I don’t think the tech is ready yet for other reasons, but absence of anyone publishing is not good evidence against.
https://en.wikipedia.org/wiki/Swarm_(simulation)
https://www.santafe.edu/research/results/working-papers/the-...
Fun fact: Swarm was one of the very few non-NeXT/Apple uses of Objective C. We used the GNU Objective C runtime. Dynamic typing was a huge help for multiagent programming compared to C++'s static typing and lack of runtime introspection. (Again, nearly 30 years ago. Things are different now.)
I enjoyed using it around 2002, got introduced via Rick Riolo at the the University of Michigan Center for the Study of Complex Systems. It was a bit of a gateway drug for me from software into modeling, particularly since I was already doing OS X/Cocoa stuff in Objective-C.
A lot of scientific modelers start with differential equations, but coming from object-oriented software ABMs made a lot more sense to me, and learning both approaches in parallel was really helpful in thinking about scale, dimensionality, representation, etc. in the modeling process, as ODEs and complex ABMs—often pathologically complex—represent end points of a continuum.
Tangentially, in one of Rick's classes we read about perceptrons, and at one point the conversation turned to, hey, would it be possible to just dump all the text of the Internet into a neural net? And here we are.
C++ has added a ton of great features since (especially C++11 onward) but run-time reflection is still sorely missed.
https://youtube.com/playlist?list=PL6zSfYNSRHalAsgIjHHsttpYf...
The idea was to think about it from different directions including academia, industry, and education.
Nobody presented multi agent simulations but I agree with you that is a very interesting way of thinking about things. There was a talk on high dimensional systems modelled with networks but the speaker didn't want their talk published online.
Anyways I'm happy to chat more about these topics. I'm obsessed with understanding complexity using ai, modelling, and other methods.
As-is, it's hard to skim the playlist, and likely terrible for organic search on Google or YouTube <3
> Nobody presented multi agent simulations but I agree with you that is a very interesting way of thinking about things.
To answer your question I did build a simulation of how a multi model agent swarm - agents have different capabilities and run times - would impact the end user wait time based on arbitrary message parsing graphs.
After playing with it for an afternoon I realized I was basically doing a very wasteful Markov chain enumeration algorithm and wrote one up accordingly.
> Swarm is currently an experimental sample framework intended to explore ergonomic interfaces for multi-agent systems. It is not intended to be used in production, and therefore has no official support. (This also means we will not be reviewing PRs or issues!)
It’s literally not meant to replace anything.
IMO the reason there’s no langchain replacement is because everything langchain does is so darn easy to do yourself, there’s hardly a point in taking on another dependency.
Though griptape.ai also exists.
> Such a shame that there's nothing to replace Langchain with other than writing it all from the ground up yourself.
Check out Microsoft Semantic Kernel: https://github.com/microsoft/semantic-kernelSupports .NET, Java, and Python. Lots of sample code[0] and support for agents[1] including a detailed guide[2].
We use it at our startup (the .NET version). It was initially quite unstable in the early days because of frequent breaking changes, but it has stabilized (for the most part). Note: the official docs may still be trailing, but the code samples in the repo and unit tests are up to date.
Highly recommended.
[0] https://github.com/microsoft/semantic-kernel/tree/main/pytho...
[1] https://github.com/microsoft/semantic-kernel/tree/main/pytho...
[2] https://github.com/microsoft/semantic-kernel/tree/main/pytho...
Their recent realtime demo had so many race conditions, function calling didn't even work, and the patch suggested by the community hasn't been merged for a week.
https://github.com/openai/openai-realtime-api-beta/issues/14
Not speaking for OpenAI here, only myself — but this is not an official SDK — only a reference implementation. The included relay is only intended as an example. The issues here will certainly be tackled for the production release of the API :).
I’d love to build something more full-featured here and may approach it as a side project. Feel free to ping me directly if you have ideas. @keithwhor on GitHub / X dot com.
https://github.com/langroid/langroid
Among many other things, we have a mature tools implementation, especially tools for orchestration (for addressing messages, controlling task flow, etc) and recently added XML-based tools that are especially useful when you want an LLM to return code via tools -- this is much more reliable than returning code in JSON-based tools.
It's MIT licensed.
“Conretely, let's define a routine to be a list of instructions in natural langauge (which we'll repreesnt with a system prompt), along with the tools necessary to complete them.”
I count 3 in one mini paragraph. Is GPT writing this and being asked to add errors, or is GPT not worth using for their own content?
If only we had a technology to access language expertise on demand...
> Yes, basically. Delete any kyegomez link on sight. He namesquats recent papers for the clout, though the code never actually runs, much less replicates the paper results. We've had problems in /r/mlscaling with people unwittingly linking his garbage - we haven't bothered to set up an Automod rule, though.
[0] https://github.com/princeton-nlp/tree-of-thought-llm/issues/...
What really bothers me is that this kyegomez person wasted time and energy of so many people and for what?
https://github.com/kyegomez/AlphaFold3
most issues are people not able to run his code. These issues are closed. The repo has 700 stars.
Also this part from the reply before editing it away:
They get mad that my repo and code is better than their's and they published they paper, they feel entitled even though I reproduced the entire paper based on 4 phrases, dfs, bfs (search algos), generate solutions, and generate thoughts and this is it. I didn't even read the full paper when I first started to implement it. The reason they want people to unstar my repo is because they are jealous that they made a mistake by not sharing the code when they published the paper as real scientists would do. If you do not publish your code as a AI research scientists you are a heretic, as your work cannot be tried and tested. and the code works amazingly much better than theirs, I looked at their code and couldn't figure out how to run it for hours, as well as other people have reported the same. the motivations are jealously, self hatred, guilt, envy, inferiority complex, ego, and much more psychographic principles.
But its best to leave them out for sanity :)
Thanks for adding the unedited comment as it shines light over the newly fabricated comment.
Thats why some subreddits flagged these name squatters.
Also, bots.
Most likely outcome is if they try to actually pursue this they lose their "trademark" and the costs drive them out of business.
[1] I didn't misremember https://www.swarm.org/wiki/Swarm:Software_main_page
> "Swarms: The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework"
Nope this doesn't mean it at all. You decided additionaly and independent from the other statements that you do not allow collaboration at all.
Which is fine the sentence is still unlogical
The real challenge for at scale inference is that the compute for models is too long to keep normal API connections open and you need a message passing system in place. This system also needs to be able to deliver large files for multi-modal models if it's not going to be obsolete in a year or two.
I build a proof of concept using email of all things but could never get anyone to fund the real deal which could run at larger than web scale.
An example use with AWS Bedrock: https://temporal.io/blog/amazon-bedrock-with-temporal-rock-s...
People really don't understand how much better LLM swarms get with more agents. I never hit a point of diminishing returns on text quality over two days of running a swarm of llama2 70Bs on an 8x4090 cluster during the stress test.
You would need something similar to, but better than, whatsapp to handle the firehose of data that needs to cascade between agents when you start running this at scale.
Could you elaborate please ?
One use for swarms is to use multiple agents/prompts in place of one single agent with one long prompt in order to increase performance by splitting one big task into many. It is very time consuming though, as it requires experimenting to determine how best to divide one task into subtasks, including writing code to parse and sanitize each task output and plug it back into the rest of the agent graph.
Dspy [1] seems to target this problem space but last time I checked it only focused on single prompt optimization (by selecting which few shots examples lead to the best prompt performance for instance), but even though I have seen papers on the subject, I have yet to find a framework that tackles the problem of agent graph optimization although research on this topic has been done [2][3][4]
[1]DSPy: The framework for programming—not prompting—foundation models: https://github.com/stanfordnlp/dspy
[2]TextGrad: Automatic 'Differentiation' via Text -- using large language models to backpropagate textual gradients: https://github.com/zou-group/textgrad
[3]What's the Magic Word? A Control Theory of LLM Prompting: https://arxiv.org/abs/2310.04444
[4]Language Agents as Optimizable Graphs: https://arxiv.org/abs/2402.16823
No.
I've tried explaining this to supposedly smart people in both a 15 minute pitch deck and a research paper and unless they were inclined to think it from the start no amount of proof has managed to convince them.
I figure it's just not possible to convince people, even with the proof in front of them, of how powerful the system is. The same way that we still have people arguing _right now_ that all LLMs are just auto complete on steroids.
You after chat GPT2 was released.
Funny because when I learned about how LLMS worked my immediate thought was "Oh, humans are just LLMs on steroids". So auto complete on steroids squared.
But I find this approach working well overall.
Moreover it is easily debuggable and testable in isolation which is one of the biggest selling point.
(If anyone is building ai products feel free to hit me.)
But yeah, I'd assume they have no ownership themselves unless they signed something explicit?
https://www.reddit.com/r/MachineLearning/comments/15sq2v1/d_...