Lessons from interviews on deploying AI Agents in production(mmc.vc)

107 pointsby advikipedia3 months ago17 comments

reedf13 months ago
What you actually need in most business cases is a 100% auditable, explainable and deterministic workflow. While AI is strictly deterministic - it is technically chaotic. Introducing this in large customer pipelines or in intensive data applications means that even if the AI only does something a bit off 99%, 99.9% or 99.99% you will see large spurious error rates in your workflow. Worst of all these will be difficult to explain - or maybe even purposely hidden, as I have seen some agents attempt to do.
- IanCal3 months ago
  You absolutely don’t need this. We know this to be true as we use humans and they are none of these things (at 100%) and we use other ml systems that don’t hit all there. Directionally those things are beneficial but you just need the benefits to outweigh the costs.
  - aprilthird20213 months ago
    > 100% auditable, explainable and deterministic workflow.
    Not 100% deterministic workers but workflow. The auditability and explainability of your system becomes difficult with AI and LLMs in between because you don't know at what point in the reasoning things turned wrong.
    You need, for a lot of things, to know at every step of the way who is culpable and what part of the work they were doing and why it went wrong and how
  - kakacik3 months ago
    Depends on the industry, clearly you never worked in such. Regulated (medical, transport, municipality, state, army and so on) or just with some decent enforced regulations like whole finance for example, and bam! you have serious regulatory issues that every single sane business tries desperately to stay away from.
    3 months ago
    undefined
    IanCal3 months ago
    “There are business problems” and “most business problems” are not the same thing.
    foobarian3 months ago
    > you have serious regulatory issues
    ... until people decide they are OK with things being less than 100% and relax the regulations. Helped along by the purveyors of the AI tools no doubt
  - gizajob3 months ago
    The difference is that although humans aren’t 100% accurate, they are responsible for their work.
    dwohnitmok3 months ago
    This has been going down over time.
    A lot of the software industry has been moving away from assigning humans individual responsibility for failure (e.g. blameless post mortems).
    Yoric3 months ago
    I suspect that it's only a small corner of the software industry, which is itself only a small corner of industry.
    I further suspect that most actors will still want someone responsible to take the blame when an incident takes place. Even if they have to make one up.
  - bandrami3 months ago
    Yeah no. I make software used on actual flight simulators and we literally need it to be deterministic, to the extent of needing the same help query to always return the exact same results for all users at all times.
    IanCal3 months ago
    Some business problems need that. That’s not the same as asserting most do and it’s certainly not the same all business problems.
    Some things need to be deterministic. Many don’t.
    Even your business will have many such problems that don’t need 100% all those properties - every task performed by a human for example. You as a developer are not all of these things 100%!
    And your help query may need to be deterministic but does it need to be explainable? Many ml solutions aren’t really explainable, certainly not to 100% whatever that may mean, but can easily be deterministic.
    charcircuit3 months ago
    If you were on a real flight and asked a human for help, they wouldn't give a deterministic answer. This doesn't seem like an actual requirement that is needed, but rather something that is post hoc rationalized because it was cheaper to make that way. While terms like consistency may come up when referring to having deterministic output as a requirement, the true reason could actually just be cost.
    throwup2383 months ago
    > If you were on a real flight and asked a human for help, they wouldn't give a deterministic answer.
    If you were on a real flight, asking a qualified human - like a trained pilot - would result in a very deterministic checklist.
    Deterministic responses to emergencies is at least half of the training from the time we get a PPL.
    hi_hi3 months ago
    Regulated industries (amongst many) need to be deterministic. Imagine your bank being non-deterministic.
    charcircuit3 months ago
    >Imagine your bank being non-deterministic.
    That's already the case. Payments are not deterministic. It can take multiple days for things to settle. The real world is messy.
    When I make a payment I have no clue if the money is actually going to make it to a merchant or if some fraud system will block it.
    hi_hi3 months ago
    The bank can very much determine if the payment has been made or not (although not immediately, as you mentioned). As a rule, banks like to keep track of money.
    soco3 months ago
    Yes it settles deterministically. With AI it claims to be settled and goes on, and it's up to you to figure it out how deterministic the whole transaction actually was.
    Yoric3 months ago
    Is it the main issue? Payments suffer from race conditions, but the processes themselves are deterministic, auditable and may be rolled back. Not sure how many of these important attributes would remain with a neural network at the helm.
    IanCal3 months ago
    Even then it can be deterministic but not explainable. Tfidf is fairly explainable but about the limit imo for full explanations making sense such that you can fully reason about them and predict outcomes and issues accurately. Embeddings could give better, fully deterministic results but I wouldn’t say they’re 100% explainable.
- thisisit3 months ago
  Just couple of hours ago I was discussing this with a Principal Architect. He is responsible for all the finance workflows. We had just come out of product demo where the vendor showed workflows which were 100% auditable, explainable and deterministic. It required human in the loop to double check AI's work.
  The feedback from the architect was that the vendor was way too cautious in using AI. Nearly all vendors he has seen so far were too cautious. He lamented that no one was fully unleashing AI. They could achieve that by allowing read/write access to confidential data like ERP/CRMs and access to internet while being fully non-deterministic. Then AI could achieve lot more.
  I explained that AI being right 95% of the time is still not good enough for finance workflows but he wouldn't budge. He kept repeating that non-deterministic and remove human in the loop is the way to go. I silently promised myself to stay away from any AI projects he might be part of.
  - bilekas3 months ago
    >He kept repeating that non-deterministic and remove human in the loop is the way to go.
    For an "Architect" this is extremely troubling..
    > He lamented that no one was fully unleashing AI
    More than likely he will never be the one cleaning up the mess, probably he will be the one contracted to design proper systems though so maybe it's a genius move.
  - yomismoaqui3 months ago
    Just suggest to him to implement or supervise the creation of a system like that ON HIS RESPONSIBILITY. That is, if the system fails and loses company/client money he has to pay it from his own account.
    Then tell us what how he sees that 5% error rate.
  - c0483 months ago
    I worked in a finance department for over a decade. That architect is a lunatic or a sheer idiot.
  - suncemoje3 months ago
    I was recently approached by a lawyer who wants to automate legal workflows. “Intriguing” I thought, given the advancements of LLMs / agentic AI + the huge funding rounds I keep seeing in LegalTech. I eventually had to give the project a pass because I didn’t believe I would be able to get AI to consistently produce accurate outputs, EVEN IF the inputs stayed the same. Couldn’t imagine building a deterministic system that scales in the legal domain…
  - Yoric3 months ago
    So it's ok if 5% of the time, his paycheck is sent to someone else?
- Joel_Mckay3 months ago
  The people outside the business of selling hype will not be keen on paying to break their business with popular liabilities. =3
  https://www.youtube.com/watch?v=_zfN9wnPvU0
- enraged_camel3 months ago
  This comment is a bit strange.
  >> While AI is strictly deterministic - it is technically chaotic
  AI is neither deterministic nor chaotic. It is nondeterministic because it works based on probability, which means that for open-ended contexts it can be unpredictable. But properly engineered agentic AI workflows can drastically reduce and even completely eliminate the unpredictability. Having proper guardrails such as well-defined prompts, validations and fallbacks in place can help ensure mistakes made by AIs don't result in errors in your system.
  - throw-qqqqq3 months ago
    > AI is neither deterministic nor chaotic. It is nondeterministic because it works based on probability
    A deterministic function/algorithm always gives the same output given the same input.
    LLMs are deterministic if you control all parameters, including the “temperature” and random “seed”. Same input (and params) -> same output.
    mejutoco3 months ago
    I thought this too, but it seems that is not the case. I could not remember the reason I saw why so I googled it (AI excerpt).
    Large Language Models (LLMs) are not perfectly deterministic even with temperature set to zero , due to factors like dynamic batching, floating-point variations, and internal model implementation details. While temperature zero makes the model choose the most probable token at each step, which is a greedy, "deterministic" strategy, these other technical factors introduce subtle, non-deterministic variations in the output
    Calavar3 months ago
    You were probably thinking about this piece on nondeterminism in attention by Thinking Machines: https://thinkingmachines.ai/blog/defeating-nondeterminism-in...
    andai3 months ago
    If I understood correctly the reason for this is that some floating point operations are not commutative?
    District55243 months ago
    Not that it's incorrect but there is some data showing variability even with the very same input and all parameters. Especially if we have no control over the model behind the API with engineering optimizations etc. See Berk Atil et al.: Non-Determinism of "Deterministic" LLM Settings, https://arxiv.org/abs/2408.04667v5
    viccis3 months ago
    Ignoring that you are making an assumption about how the randomness is handled, this is a very vacuous definition of "deterministic" in the context of the discussion here, which is AI controlling large and complex systems. The fact that each inference can be repeated if and only if you know and control the seed and it is implemented with a simple PRNG is much less important to the conversation than its high level behavior, which is nondeterministic in this application.
    If your system is only deterministic if it processes its huge web of interconnected agentic prompts in exactly the same order, then its behavior is not deterministic in any sense that could ever be important in the context of predictable and repeatable system behavior. If I ask you whether it will handle the same task the same exact way, and its handling of it involves lots of concurrent calls that are never guaranteed to be ordered the same way, then you can't answer "yes".
    mbesto3 months ago
    > LLMs are deterministic if you control all parameters, including the “temperature” and random “seed”.
    This is not true. Even my LLM told me this isn't true: https://www.perplexity.ai/search/are-llms-deterministic-if-y...
    cnnlives783 months ago
    The LLMs most of us are using have some element of randomness to every token selected, which is non-deterministic. You can try to attempt to corral that, but statistically, with enough iteration, it may provide nonsense, unintentional, dangerous, opposite solutions/answers/action, even if you have system instructions defining otherwise and a series of LLMs checking themselves. Be sure that you fully understand this. Even if you could make it fully deterministic, it would be deterministic based on the model and state, and you’ll surely be updating those. It amazes me how little people know about what they’re using.
  - CjHuber3 months ago
    Are they? I mean I wouldn't say they are strictly deterministic, but with a temperature and topk of 0 and topp of 1 you can at least get them to be deterministic if I'm correct. In my experience if you need a higher temp than 0 in a prompt that is supposed to be within a pipeline, you need to optimize your prompt rather than introduce non determinism. Still of course that doesn't mean some inputs won't give unexpected outputs.
    flufluflufluffy3 months ago
    In the hard, logically rigorous sense of the word, yes they are deterministic. Computers are deterministic machines. Everything that runs on a computer is deterministic. If that wasn’t the case, computers wouldn’t work. Of course I am considering the idealized version of a computer that is immune to environmental disturbances (a stray cosmic ray striking just the right spot and flipping a bit, somebody yanking out a RAM card, etc etc).
    LLMs are computation, they are very complex, but they are deterministic. If you run one on the same device, in the same state, with exactly the same input parameters multiple times, you will always get the same result. This is the case for every possible program. Most of the time, we don’t run them with exactly the same input parameters, or we run them on different devices, or some part of the state of the system has changed between runs, which could all potentially result in a different outcome (which, incidentally, is also the case for every possible program).
    blibble3 months ago
    > Computers are deterministic machines. Everything that runs on a computer is deterministic. If that wasn’t the case, computers wouldn’t work.
    GPU operations on floating point are generally not deterministic and are subject to the whims of the scheduler
    flufluflufluffy3 months ago
    If the state of the system is the same, the scheduler will execute the same way. Usually, the state of the system is different between runs. But yeah that’s why I qualified it with the hard, logically rigorous sense of the word.
    blibble3 months ago
    > Are they? I mean I wouldn't say they are strictly deterministic, but with a temperature and topk of 0 and topp of 1 you can at least get them to be deterministic if I'm correct.
    the mathematics might be
    but not on a GPU, because floating point numbers are an approximation, and their operations are not commutative
    if the GPUs internal scheduler reorders the operations you will get a different outcome
    remember GPUs were designed to render quake, where drawing pixels slightly off is imperceptible
  - jampekka3 months ago
    I wouldn't be surprised if autoregressive LLMs had some chaotic attractors if you stretch the concept to finite discrete state (tokens).
  - Incipient3 months ago
    Can you share some examples of eliminating non-determinism? I feel like I should be able to integrate agents into various business systems, but this issue is a blocker.
    Eg. An auto email parser that extracts an "action" - I just don't trust that the action will be accurate and precise enough to execute without rereading the email (hence defeating the purpose of the agent)
    _joel3 months ago
    I'm not sure it eliminates but reducing the temperature and top-k/p?
- belter3 months ago
  > Introducing this in large customer pipelines or in intensive data applications means that even if the AI only does something a bit off 99%, 99.9% or 99.99% you will see large spurious error rates in your workflow.
  You just described how you get your google account locked... :-)
- flir3 months ago
  How can the agent hide the error?
  You log the interaction, you see what happened, no?
  - wongarsu3 months ago
    In coding agents that would be "the test keeps failing and I can't fix it - let's delete the test" or "I can't fix this bug, let's delete the feature"
    If you measure success by unit test failures or by the presence of the bug those behaviors can obscure that the LLM wasn't able to do the intended fix. Of course a closer inspection will still reveal what happened, but using proxy measurements to track success is dangerous, especially if the LLM knows about them or if the task description implies improving that metric "a unit test is failing, fix that"
    flir3 months ago
    Sure, but the discussion here is around "in production"? I'm trying to imagine a scenario and I'm coming up short.
    sebastiennight3 months ago
    In GP's comment, the coding agent is deployed "in production" since you (the developer) and/or your company are paying for it to use it in your business.
    flir3 months ago
    "Introducing this in large customer pipelines or in intensive data applications"
    *shrug*
    To be honest, I don't think I'm going to get an answer.
advikipedia3 months ago
We recently spoke with 30+ startup founders and 40+ enterprise practitioners who are building and deploying agentic AI systems across industries like financial services, healthcare, cybersecurity, and developer tooling.
A few patterns emerged that might be relevant to anyone working on applied AI or automation:
- The main blockers aren’t technical. Most founders pointed to workflow integration, employee trust, and data privacy as the toughest challenges — not model performance.
- Incremental deployment beats ambition. Successful teams focus on narrow, verifiable use cases that deliver measurable ROI and build user trust before scaling autonomy.
- Enterprise adoption is uneven. Many companies have “some agents” in production, but most use them with strong human oversight. The fully autonomous cases remain rare.
- Pricing is unresolved. Hybrid models dominate; pure outcome-based pricing is uncommon due to attribution and monitoring challenges.
Infrastructure is mostly homegrown. Over half of surveyed startups build their own agentic stacks, citing limited flexibility in existing frameworks.
The article also includes detailed case studies, commentary on autonomy vs. accuracy trade-offs, and what’s next for ambient and proactive agents.
If you’re building in this space, the full report is free here: https://mmc.vc/research/state-of-agentic-ai-founders-edition...
Would be interested to hear how others on HN are thinking about real-world deployment challenges — especially around trust, evaluation, and scaling agentic systems.
- Etheryte3 months ago
  Perhaps I simply don't understand what you mean, but it sounds like the first point could be rephrased in some way. To me, workflow integration and data privacy sound very much like technical blockers.
  - barrenko3 months ago
    But if you define them as non-technical related blockers agents are just swell.
  - refactor_master3 months ago
    Consider this simple example: Storing all your sensitive user data in one centralized location (e.g. a US server) would be great for any kind of analytics and modeling to tap into, and is technically very easy to do, but it also violates virtually every country's data privacy laws. So then you have to set up siloed servers around the world, deal with data governance, legal stuff, etc.
    Sure, it then becomes a technical challenge to work around those limits, but that may be cost/time prohibitive.
    17186274403 months ago
    That sounds more like, that you can solve the problem, when it would have other requirements.
    refactor_master3 months ago
    If you ask Silicon Valley, any organizational problem can be a technical problem if you try hard enough.
  - advikipedia3 months ago
    More than the "actual" problem, the "perception" of the problem is worse. Workflow integration is more to do with users having to rethink their workflows, their roles, and how they work with AI. As for data privacy concerns, even where startups have taken measures to overcome the problems, very often enterprises still remain concerned (making this more of a perception problem than an actual problem). That's why I focused on the non-technical aspect of it!
    DrScientist3 months ago
    When I see vendors complain about workflow and integration issues, it's because the vendors software is written around an expectation of a certain workflow and integration points and they find out in reality every customer does it slightly differently.
    Some key challenges around workflow are that while the fundamental white-board task flow is the same, different companies may distribute those tasks between people and over time in different ways.
    Workflow is about flowing the task and associated information between people - not just doing the tasks.
    Same goes for integration - the timing of when certain necessary information might be available again not uniform and timing concerns are often missed on the high level whiteboard.
    Here's a classic example of ignoring timing issues.
    https://www.harrowell.org.uk/blog/2017/03/19/universal-credi...
  - IanCal3 months ago
    There are two sides to workflow integration.
    One is technical (it’s a hassle to connect things to a specific system because you’d need to deal with the api or there is no api)
    The other isn’t, because it’s figuring out how and where to use these new tools in an existing workflow. Maybe you could design something from scratch but you have lots of business processes right now, how do you smoothly modify that? Where does it make sense?
    Frankly understanding what the systems can and can’t do takes at least some time even if only because the field is moving so fast (I worked with a small local firm who I was able to help by showing them the dramatic improvements in transcription quality vs cost recently - people here are more used to whisper and the like but it’s not as common knowledge how and where you can use these things).
- woeirua3 months ago
  Lack of employee trust in these systems is caused by model (under)performance. There's a HUGE disconnect between the C-suite right now and the people on the ground using these models. Anyone who builds something with the models would tell you that they can't be trusted.
- baxtr3 months ago
  > The main blockers aren’t technical. Most founders pointed to workflow integration, employee trust, and data privacy as the toughest challenges — not model performance.
  What does that even mean? Are you trying to say that the problem isn’t that the AI models are bad — it’s that it’s hard to get people to use them naturally in their daily work?
  - Arnechos3 months ago
    For example where I work business users required model output to be 100% correct, which wasn't possible, so they decided to stick to old manual workflow.
    sigwinch3 months ago
    That’s our definition of a process: when your objective is well-defined, a process is guaranteed to succeed. Not everything is a process. And sometimes people mistake what the desired success must be. For example, a piece of surgical equipment might not have features guaranteeing profitability.
- thatjoeoverthr3 months ago
  Honestly, just sad seeing AI posts on HN now.
  - ChrisMarshallNY3 months ago
    I’m not sure this is the case, here (although it’s always a possibility, sadly).
    It just looks like the highly-polished marketing copy I’ve read, all my career. It’s entirely possible that it was edited by AI (a task that I have found useful), but I think that it’s actually a fairly important (to the firm) paper, and was likely originally written by their staff (or a consultant), and carefully edited.
    I do feel as if it’s a promotional effort, but HN often features promotional material, if it is of interest to our community.
throwaway_24943 months ago
Prediction: Non-determinism will become acceptable in areas we used to expect accuracy.
For example we will accept 'probabilistic bookkeeping' because it's cheaper than requiring ledgers to balance to the penny.
But this leeway won't be equally applied. Powerful institutions like banks will use “probabilistic models” to decide they probably don’t owe you that refund, but if they decide you owe them money, they will still hold you to every cent.
Nondeterminism for the powerful, determinism for everyone else. Yay!
- creaghpatr3 months ago
  Agree with your prediction in many cases but not bookkeeping. The whole point of bookkeeping is to balance the ledger to the penny so people would toss bookkeeping altogether before accepting a 'probabilistic' output. Agents will be used to accelerate the recon process though, or maybe they will become advanced enough to provide a (correct) deterministic result.
  - skeeter20203 months ago
    So the worst outcome? We will still demand deterministic bookkeeping, but everyone will attempt to "optimize" using non-deterministic tools and assitance? Kind of feels like the US/Canada tax codes of today...
    suncemoje3 months ago
    That’s where “human-in-the-loop” becomes a necessity, which _adds_ a step as opposed to removing one
- nicbou3 months ago
  A bunch of people are going to become "acceptable risk" and "cost of doing business". Companies will opt to get it 95% right for cheap over getting it 100% right, and for 95% of the population, it will be good enough.
- skystarman3 months ago
  Companies can decide today that they don't need near-perfect accuracy in bookkeeping to save a few bucks and no one does that. One of the major factors is regulatory requirements. Even without I'm sure investors would apply a hefty discount to any public company that decided to save a few pennies on accountants while sacrificing accuracy.
- rolandog3 months ago
  Don't forget that this will likely be paired with rubber-stamping one-sided arguments maskerading as quality control processes where some maliciously-biased oversampling of (probably paid-for) good reviews takes place in order to consistently reach the conclusion that there's nothing wrong going on.
- nitwit0053 months ago
  The bookkeeping can often be fully automated, if you care to do so, so there probably isn't a point.
  People have used machine learning for fraud detection for a long time at this point. They do tolerate the false positives.
- yomismoaqui3 months ago
  From what I remember reading on some tutorial about Random Tree classifiers banks on the USA have to justify the specific reasons why a credit was denied, so hence why blackbox models cannot be used for this.
  - Yoric3 months ago
    I'm not sure that's a sufficient argument.
    1. Laws can change.
    2. Blackbox models can provide specific reasons, even if they need to hallucinate them.
donatj3 months ago
I've seen companies including my own pouring lots of money into AI. Outside of "replacing developers", I am genuinely curious what have people done that's actually useful?
We've got a sort of "business intelligence" AI they poured a lot of time and money into, and I don't think anyone really uses it because it makes stuff up.
I'm sure there are things. I just haven't seen them. I would love to hear concrete examples.
The cynic in me says I wouldn't want something with the error aptitude and truth telling of a small child taking any sort of important action on my behalf.
- Balgair3 months ago
  Two use cases with my corpo are:
  1) Enhanced User guides/manuals.
  We sell some very complicated and expensive instruments. As such, making them work is quite hard. One of our biggest expenses is our engineers that go out, physically, to help customers. Company policy is that the first visit is always free. These customers can be very remote (Deep sea oil platforms, Australian outback, quite nice ski country ;) , etc). Often their issue is simple but they can also be very complex. We have phone trees, email, texts, iridium phones, etc. to talk customers through things to avoid these first visits and then hep them afterwards. So adding in AI chatbots is a natural way to help out. People don't feel quite the same 'shame' in asking really dumb questions to a chatbot that they do to a real person. So, to make these chatbots smarter, we use some of this AI mumbo-jumbo (RAG), to help them out. So far, it seems successful and the customer and engineers like the enhanced/AI manuals.
  2) Making said manuals
  We support 35 languages and many regulatory environments. Our instruments are all compliant with whatever version of a government agency you've got (modulo a lot of time, money, ITAR regulations). As such, making all that paper (manuals, compliance docs, contracts, etc) takes a lot of time and effort and has to pass the legal tests too. So AI is really helpful with it. Most of the work for these large stacks of paper is essentially boilerplate, but all subtly different so that literal copy-pasting doesn't get you quite that far. AI systems have been able to, last I checked, get that team about 5x faster, as it cuts out ~85% of the process and drudgery. Since these documents get hauled into courts, they can't just be blindly AI made, and a human always has to go over everything with a sharp eye still, but AI helps out there a bit too. Last lunch I had with them, they were saying that they were actually working on their burn-down charts now and not just going from panic to panic. As in, they could actually do their jobs.
- sigwinch3 months ago
  We had a lack of (digital images of) training cases in emergency surgery. You’d prefer to give ML experience with many rare cases, but must resort to style transfer. Humans can do this, but variously problematic and you’re taking them away from necessary work.
datadrivenangel3 months ago
"AI agents can reason toward a goal, make dynamic decisions on the fly, and learn or improve over time"
I've yet to see an 'agentic' setup that actually learns or improves over time. There are many techniques for this, but I don't see them used.
- htrp3 months ago
  > There are many techniques for this, but I don't see them used
  Why do you think that is?
roxolotl3 months ago
> The main blockers aren’t technical. Most founders pointed to workflow integration, employee trust, and data privacy as the toughest challenges — not model performance.
These, outside of employee resistance are technical problems. The insistence they aren’t seems to be the root of the misunderstanding around these tools. The reality is that “computers that speak English” are, at face value, incredibly impressive. But there’s nothing inherent to said systems that makes them easier to integrate with than computers which speak C. In fact I’d argue it’s harder because natural languages are significantly less precise.
Communication and integration is incredibly challenging because you’re trying to transfer states between systems. When you let “the machine carry a larger share of the burden,” as Dijkstra described of the presumed benefit of natural language programming but actual downside[0], you’re also forfeiting a large amount of control. It is for the same reason that word problems are considered more challenging than equations in math class. With natural languages the states being communicated are much less clear than with formal languages and much of the burden assumed to be transferred to the machine is returned in the form of an increase in required specificity and preciseness of which formal languages already solve for.
None of this is to say these tools aren’t useful nor that they cannot be deployed successfully. It is instead to say that the seduction of computers which speak English is more exactly that. These tools are incredibly easy to use to impress, and much more challenging to use to extract value.
0: https://www.cs.utexas.edu/~EWD/transcriptions/EWD06xx/EWD667...
- IanCal3 months ago
  I think I disagree here.
  The integration parts aren’t natural language issues but connecting systems and how to put these things in your workflow.
  For example. I have a bunch of off the cuff questions and problems and tasks. I want to have these in one place and have that trigger a conversation with ChatGPT, which shows the results in the first place but can be continued easily.
  Before it was added the other week, I could track issues in linear and I could have codex write the code to solve them but only by manually moving text from one place to another, manually moving the tickets, checking on progress, click g buttons to open prs - all of that is integration hassle but none is about the model itself. I think now with GitHub copilot I could just assign a task.
  - roxolotl3 months ago
    The point is because natural language systems seem so easy to use people assume the existing hard parts will become easy. But integrating systems is hard not because formal languages are challenging to use but because connecting systems is inherently challenging. The switch from formal to natural language doesn’t reduce challenge it just alters it.
    We’re saying the same thing. Integration is the hard (still technical) part.
3 months ago
undefined
pmarreck3 months ago
"Working with AI is like being a mentor for a monkey pissing into its own mouth. Using agentic pipelines is doing the same, but now it's 5 monkeys pissing into each others' mouths in a Roman fountain kind of way."
Except it's worse than that because we'll all end up having to do it anyway, because the overall velocity of emitted working code will be faster, and productivity > all.
jakozaur3 months ago
I talked to some enterprises and saw similar patterns:
1. Agentic AI systems are hard to measure and evaluate methodologically.
2. Quote from Salesforce analyst day: "it's been so easy to build a killer demo, but why has it been so hard to get agents that actually deliver the goods.”
3. Unfortunately, small errors tend to compound over time, which means most systems need a human in the loop as of 2025.
4. A lot of enterprise buyers feel the huge potential (and FOMO), yet ROI is still unclear as of 2025. MIT report "State of AI in business 2025": Despite $30–40 billion in enterprise investment into GenAI, 95% of organizations are not seeing profit and loss impact.
- thatjoeoverthr3 months ago
  The Salesforce analyst quote is ChatGPT, though.
  - jakozaur3 months ago
    Source: https://seekingalpha.com/article/4830274-salesforce-inc-crm-...
    thatjoeoverthr3 months ago
    [flagged]
gizajob3 months ago
I know some people in business who won’t even delegate to very competent humans who have worked closely alongside them for years. And now we have to believe that letting AI agents autonomously roam without oversight is going to be acceptable to people like that.
arisAlexis3 months ago
All of these are temporal. It's the only thing we can be sure of. Very limited value in these "lessons" 6 months ahead.
- layer83 months ago
  Do you mean “temporary”?
chmod7753 months ago
It took me a while to realize that the cringe AI hype bro is just wearing a tie this time. Unsubstantiated fluff with anonymous sources that wants to disguise itself as legitimate research.
Maybe at least these charts are based on real data - albeit self-reported by AI startups likely talking to their investors.
Either way it's useless unless hopping on this train is a past time of yours or you make a living taking investors - to poor to fund an OpenAI, but just rich enough to fund someone eating OpenAI's scraps - for their money.
It has been how many years of people trying to create businesses around chatgpt prompts? I think we need to bring bullying back. This is getting ridiculous.
jgoode193 months ago
[dead]
sanskarix3 months ago
[dead]
- advikipedia3 months ago
  That's precisely what we found in our research as well! We outlined it in our observations too (excerpt below):
  The most successful deployment strategies we’ve seen started with:
  simple and specific use cases with clear value drivers, that were low risk yet medium impact;
  weren’t majorly disruptive to existing workflows;
  preferably automating a task that the human user dislikes (or was outsourced);
  the output of the workflow can be easily/quickly verified by the human for accuracy or suitability; and
  demonstrated clear ROI quickly
  Given the current levels of technological development, AI Agents work best when narrowly applied to very specific tasks and operating under a specific context. For instance, we’ve seen this in healthcare with revenue cycle management processes (claim and denial management) that health systems were already outsourcing to third-party providers.
  The land-and-expand strategy for AI agents is very different to traditional SaaS. Given enterprises are increasingly under pressure from the C-Suite to incorporate AI into their work, there are plenty of opportunities for startups to “land” but it’s much harder to “expand” – and not only that, it’s taking much longer to expand even when they want to expand, because it’s a use case by use case rollout.
  Much like the iconic Volkswagen ad, sometimes it’s better to “Think Small” and build trust first, rather than attempt too many use cases (and excessively complex use cases) right off the bat.
  - DrScientist3 months ago
    > Much like the iconic Volkswagen ad, sometimes it’s better to “Think Small” and build trust firsy
    Poor choice of example for building trust - Volkswagen: - lie big on emissions/fuel efficiency performance.
sanskarix3 months ago
[dead]
OBELISK_ASI3 months ago
[dead]
veegee3 months ago
[dead]