1. We advocate automation because people like Brenda are error-prone and machines are perfect.
2. We disavow AI because people like Brenda are perfect and the machine is error-prone.
These aren't contradictions because we only advocate for automation in limited contexts: when the task is understandable, the execution is reliable, the process is observable, and the endeavour tedious. The complexity of the task isn't a factor - it's complex to generate correct machine code, but we trust compilers to do it all the time.
In a nutshell, we seem to be fine with automation if we can have a mental model of what it does and how it does it in a way that saves humans effort.
So, then - why don't people embrace AI with thinking mode as an acceptable form of automation? Can't the C-suite in this case follow its thought process and step in when it messes up?
I think people still find AI repugnant in that case. There's still a sense of "I don't know why you did this and it scares me", despite the debuggability, and it comes from the autonomy without guardrails. People want to be able to stop bad things before they happen, but with AI you often only seem to do so after the fact.
Narrow AI, AI with guardrails, AI with multiple safety redundancies - these don't elicit the same reaction. They seem to be valid, acceptable forms of automation. Perhaps that's what the ecosystem will eventually tend to, hopefully.
When it comes to (traditional) coding, for the most part, when I program a function to do X, every single time I run that function from now until the heat death of the sun, it will always produce Y. Forever! When it does, we understand why, and when it doesn't, we also can understand why it didn't!
When I use AI to perform X, every single time I run that AI from now until the heat death of the sun it will maybe produce Y. Forever! When it does, we don't understand why, and when it doesn't, we also don't understand why!
We know that Brenda might screw up sometimes but she doesn't run at the speed of light, isn't able to produce a thousand lines of Excel Macro in 3 seconds, doesn't hallucinate (well, let's hope she doesn't), can follow instructions etc. If she does make a mistake, we can find it, fix it, ask her what happened etc. before the damage is too great.
In short: when AI does anything at all, we only have, at best, a rough approximation of why it did it. With Brenda, it only takes a couple of questions to figure it out!
Before anyone says I'm against AI, I love it and am neck-deep in it all day when programming (not vibe-coding!) so I have a full understanding of what I'm getting myself into but I also know its limitations!
To make this even worse, it may even produce Y just enough times to make it seem reliable and then it is unleashed without supervision, running thousands or millions of times, wrecking havoc producing Z in a large number of places.
A computer program should deliver reliable, consistent output if it is consistently given the same input. If I wanted inconsistency and unreliability, I'd ask a human to do it.
Their valuations: AI all the things
Reality: AI the minimum number of steps, surrounded by guardrails and traditional deterministic automation
But AI companies won't be worth AI money if that reality persists.
/s ffs
Brenda just recalls some predetermined behaviors she's lived out before. She cannot recall any given moment like we want to believe.
Ever think to ask Brenda what else she might spend her life on if these 100% ephemeral office role play "be good little missionaries for the wall street/dollar" gigs didn't exist?
You're revealing your ignorance of how people work while being anxious about our ignorance of how the machine works. You have acclimated to your ignorance well enough it seems. What's the big deal if we don't understand the AI entirely? Most drivers are not ASE certified mechanics. Most programmers are not electrical engineers. Most electrical engineers are not physicists. I can see it's not raining without being a climatologist. Experts circumlocute the language of their expertise without realizing their language does not give rise to reality. Reality gives rise to the language. So reality will be fine if we don't always have the language.
Think of a random date generator that only generates dates in your lived past. It does so. Once you read the date and confirm you were alive can you describe what you did? Oh no! You don't have memory of every moment to generate language for. Cognitive function returned null. Universe intact.
Lack of understanding how you desire is unimportant.
You think you're cherishing Brenda but really just projecting co-dependency that others LARP effort that probably doesn't really matter. It's just social gossip we were raised on so it takes up a lot of our working memory.
But you are absolutely right about one thing. Brenda can be asked and, depending on her experience, she might give you a good idea of what might have happened. LLMs still seem to not have that 'feature'.
When we say “machine”, we mean deterministic algorithms and predictable mechanisms.
Generative AI is neither of those things (in theory it is deterministic but not for any practical applications).
If we order by predictability:
Quick Sort > Brenda > Gen AI
Machine reliability does the same thing the same way every time. If there's an error on some input, it will always make that error on that input, and somebody can investigate it and fix it, and then it will never make that error again.
Human reliability does the job even when there are weird variances or things nobody bothered to check for. If the printer runs out of paper, the human goes to the supply cabinet and gets out paper and if there is no paper the human decides whether to run out right now and buy more paper or postpone the print job until tomorrow; possibly they decide that the printing doesn't need to be done at all, or they go downstairs and use a different printer... Humans make errors but they fix them.
LLMs are not machine reliable and not human reliable.
Sure, these humans exists, but the others, that I happen to encounter every day unfortunately, are the ones that go into broken mode immediately when something is unexpected. Today I ordered something they ran out of and the girl behind the counter just stared in The Deep not having a clue what to do now. Do or say. Or yesterday at dinner, the PoS (on batteries) ran out of power when I tried to pay for dinner. The guy just walked off and went outside for a smoke. I stood there with waiting to pay. The owner apologized and fixed it after a while but I am saying, the employee who runs out of paper and then finds and puts more paper in is not very ... common... In the real world.
Then came the apps and notifications, and we had to caveat "... when you're writing programs". Which is a diminishing part of the computer experience.
And now we have to append "... unless you're using AI tools".
The distinction is clear to technical people. But it seems like an increasingly niche and alien thing from the broader societal perspective.
I think we need a new refrain, because with the AI stuff it increasingly seems "computers do what they want, don't even get it right, but pretend that they did."
The other half comes from how incredibly opinionated and controlling the tech giants have become. Microsoft doesn’t even ALLOW consent on windows (yes or maybe later), Google is doing all it can to turn the entire internet into a chrome-only experience, and Apple has to be fought for an entire decade to allow users to place app icons wherever they want on their Home Screen.
There is no question that the overly explicit quirky paradigm of the past was better for almost everyone. It allowed for user control and user expression, but apparently those concepts are bad for the wallet of big tech so they have to go. Generative AI is just the latest biggest nail in the coffin.
When the user can't reason about it, it isn't deterministic to them.
The only relevant metric here is how often each thing makes mistakes. Programs are the most reliable, though far from 100%, humans are much less than that, and LLMs are around the level of humans, depending on the humans and the LLM.
When LLM does the same, we call it hallucination and blame the human.
In my life, I've never seen `sort` produce output that wasn't properly sorted. I've never seen a calculator come up with the wrong answer when adding two numbers. I have seen filesystems fail to produce the exact same data that was previously written, but this is something that happens once in a blue moon, and the process is done probably millions of times a day on my computers.
There are bugs, but bugs can be reduced to a very low level with time, effort, and motivation. And technically, most bugs are predictable in theory, they just aren't known ahead of time. There are hardware issues, but those are usually extremely rare.
Nothing is 100% predictable, but software can get to a point that's almost indistinguishable.
This is a tautology.
> I've never seen a calculator come up with the wrong answer when adding two numbers.
> And technically, most bugs are predictable in theory, they just aren't known ahead of time.
When we're talking about reliability, it doesn't matter whether a thing can be reliable in theory, it matters whether it's reliable in practice. Software is unreliable, humans are unreliable, LLMs are unreliable. To claim otherwise is just wishful thinking.
The ios calculator will make the same incorrect calculation, but reliably, every time.
> I've never seen a calculator come up with the wrong answer when adding two numbers.
1.00000001 + 1 doesn't equal 2, therefore the claim is false.
The original screenshot shows a number with 13 decimal places, and if you set it at or above 13, then the calculation will come out correct.
The application doesn't really go out of its way to communicate this to the user. For the most part maybe it doesn't matter, but "user entering more decimal places than they'll get back" might be one thing an application might usefully highlight.
When a personal experience is cited, a valid counterargument would be "your experience is not representative," not "you are incorrect about your own experience."
Note that I am referring to actual physical calculators, not calculator apps on computers.
1.00000001 + 1 doesn't equal 2, therefore the claim is false.
At the precision the system is designed to operate at, the answer is 2.
No it's not. There are plenty of things that can't be 100% reliable no matter how well they're made. A perfect bridge is still going to break down and eventually fall apart. The best possible motion-activated light is going to have false positives and false negatives because the real world is messy. Light bulbs will burn out no matter how much care and effort goes into them.
In any case, unless you assert that programs are never made well, then your own statement disproves your previous statement that the reliability of programs is "far from 100%."
Plenty of software is extremely reliable in practice. It's just easy to forget about it because good, reliable software tends to be invisible.
All these failure modes are known and predicable, at least statistically
Intel once made a CPU that barely got some math wrong that probably would not affect the vast majority of users. The backlash from the industry was so strong that intel spent half a billion (1994) dollars replacing all of them.
Our entire industry avoids floating point numbers for some types of calculations because, even though they are mostly deterministic with minimal constraints, that mental model is so hard to manage that you are better off avoiding it entirely and removing an entire class of errors from your work
But now we are just supposed to do everything with a slot machine that WILL randomly just do the wrong thing some unknowable percentage of the time, and that wrong thing has no logic?
No, fuck that. I don't even call myself an engineer and such frivolity is still beyond the pale. I didn't take 4 years of college and ten years of hard earned experience to build systems that will randomly fuck over people with no explanation or rhyme or reason.
I DO use systems that are probabilistic in nature, but we use rather simple versions of those because when I tell management "We can't explain why the model got that output", they rightly refuse to accept that answer. Some percentage of orders getting mispredicted is fine. Orders getting mispredicted that cannot be explained entirely from their data is NOT. When a customer calls us, we cannot tell them "Oh, that's just how Neural networks are, you were unlucky".
Notably, those in the industry that HAVE jumped on the neural net/"AI" bandwagon for this exact problem domain have not demonstrated anything close to seriously better results. In fact, one of our most DRAMATICALLY effective signals is a third party service that has been around for decades, and we were using a legacy integration that hadn't been updated in a decade. Meanwhile, Google's equivalent product/service couldn't even match the results of internally developed random forest models from data science teams that were.... not good. It didn't even match the service Microsoft has recently killed, which was similarly bragadocious about "AI" and similarly trash.
All that panopticon's worth of data, all that computing power, all that supposed talent, all that lack of privacy and tracking, and it was almost as bad as a coin flip.
> Quick Sort > Brenda > Gen AI
Those last two might be the wrong way round.
Some of these Brenda types are actually, really perfect. Unless they are sick, they never make mistakes. Sure they are a small minority, but they do exist.
No, no. We disavow AI because our great leaders inexplicably trust it more than Brenda.
* Gen AI never disagrees with or objects to boss's ideas, even if they are bad or harmful to the company or others. In fact, it always praises them no matter what. Brenda, being a well-intentioned human being, might object to bad or immoral ideas to prevent harm. Since boss's ego is too fragile to accept criticism, he prefers gen AI.
* Boss is usually not qualified, willing, or free to do Brenda's job to the same quality standard as Brenda. This compels him to pay Brenda and treat her with basic decency, which is a nuisance. Gen AI does not demand fair or decent treatment and (at least for now) is cheaper than Brenda. It can work at any time and under conditions Brenda refuses to. So boss prefers gen AI.
* Brenda takes accountability for and pride in her work, making sure it is of high quality and as free of errors as she can manage. This is wasteful: boss only needs output that is good enough to make it someone else's problem, and as fast as possible. This is exactly what gen AI gives him, so boss prefers gen AI.
Then I think managers would be fine hiring that worker for that rate as well.
You don't have a human to manage. The relationship is completely one-sided, you can query a generative AI at 3 in the morning on new years eve. This entity has no emotions to manage and no own interests.
There's cost.
There's an implicit promise of improvement over time.
There's an the domain of expertise being inhumanly wide. You can ask about cookies right now, then about XII century France, then about biochemistry.
The fact that an average worker would be fired if they perform the same way is what the human actually competes with. They have responsibility, which is not something AI can offer. If it was the case that, say, Anthropic, actually signed contracts stating that they are liable for any mistakes, then humans would be absolutely toast.
Because it doesn't need to sleep or spend time with its family.
- It says it's done when its code does not even work, sometimes when it does not even compile.
- When asked to fix a bug, it confidently declares victory without actually having fixed the bug.
- It gets into this mode where, when it doesn't know what to do, it just tries random things over and over, each time confidently telling me "Perfect! I found the error!" and then waiting for the inevitable response from me: "No, you didn't. Revert that change".
- Only when you give it explicit, detailed commands, "modify fade_output to be -90," will it actually produce decent results, but by the time I get to that level of detail, I might as well be writing the code myself.
To top it off, unlike the junior engineer, Claude never learns from its mistakes. It makes the same ones over and over and over, even if you include "don't make XYZ mistake" in the prompt. If I were an eng manager, Claude would be on a PIP.
My experience is that I think of a new feature I want, I take a minute or so to explain it to Claude, press enter, and go off and do something else. When I come back in a few minutes, the desired feature has been implemented correctly with reasonable design choices. I'm not saying this happens most of the time, I'm saying it happens every time. Claude makes mistakes but corrects them before coming to rest. (Often my taste will differ from Claude's slightly, so I'll ask for some tweaks, but that's it.)
The takeaway I'm suggesting is that not everyone has the same experience when it comes to getting useful results from Claude. Presumably it depends on what you're asking for, how you ask, the size of the codebase, how the context is structured, etc.
Did you have it creating and running automated tests as it worked?
I've tried to put in the work. I can even get it working well for a while. But then all of a sudden it is like the model suffers a massive blow to the head and can't produce anything coherent anymore. Then it is back to the drawing board, trying all over again.
It is exhausting. The promise of what it could be is really tempting fruit, but I am at the point that I can't find the value. The cost of my time to put in the work is not being multiplied in return.
> Did you have it creating and running automated tests as it worked?
Yes. I work in a professional capacity. This is a necessity regardless of who (or what) is producing the product.
> - When asked to fix a bug, it confidently declares victory without actually having fixed the bug.
You need to give it ways to validate its work. A junior dev will also give you code that doesn't compile or should have fixed a bug but doesn't if they don't actually compile the code and test that the bug is truly fixed.
Don't get me wrong: Claude seems to be very useful if it's on a well-trodden train track and never has to go off the tracks. But it struggles when its output is incorrect.
The worst behavior is this "try things over and over" behavior, which is also very common among junior developers and is one of the habits I try to break from real humans, too. I've gone so far as to put into the root CLAUDE.md system prompt:
--NEVER-- try fixes that you are not sure will work.
--ALWAYS-- prove that something is expected to work and is the correct fix, before implementing it, and then verify the expected output after applying the fix.
...which is a fundamental thing I'd ask of a real software engineer, too. Problem is, as an LLM, it's just spitting out probabilistic sentences: it is always 100% confident of its next few words. Which makes it a poor investigator.
(More seriously, she also has 20+ years of institutional knowledge about how the company works, none of which has ever been captured anywhere else.)
That is precisely why we have humans in the loop for so many AI applications.
If [AI + human reviewer to correct it] is some multiple more efficient than [human alone], there is still plenty of value.
I disagree. If something can't be as accurate as a (good) human, then it's useless to me. I'll just ask the human instead, because I know that the human is going to be worth listening to.
Good in most conditions. Not as good as a human. Which is why we still have skilled pilots flying planes, assisted by autopilot.
We don’t say “it’s not as good as a human, so stuff it.”
We say, “it’s great in most conditions. And humans are trained how to leverage it effectively and trained to fly when it cannot be used.”
Aviation autopilot systems are the complete opposite. They are arguably the most reliable computer-based systems ever created. While they cannot fly a plane alone, pilots can trust them blindly to do specific, known tasks consistently well in over 99.99999% of cases, and provide clear diagnostics in case they cannot.
If gen AI agents were this consistently good at anything, this discussion would not be happening.
This can still be problematic! If sensors are feeding the autopilot bad data, the autopilot may do the wrong thing for a situation. Likewise, if the pilot(s) do not understand the autopilot's behaviors, they may misuse the autopilot, or take actions that interfere with the autopilot's operation.
Generative AI has unpredictable results. You cannot make confident statements like "if inputs X, Y, and Z are at these values, the system will always produce this set of outputs".
In the very short timeline of reacting to a critical mid-flight situation, confidence in the behavior of the systems is critical. A lot of plane crashes have "the pilot didn't understand what the automation was doing" as a significant contributing factor. We get enough of that from lack of training, differences between aircraft manufacturers, and plain old human fallibility. We don't need to introduce a randomized source of opportunities for the pilots to not understand what the automation is doing.
It started out as, "AI can make more errors than a human. Therefore, it is not useful to humans." Which I disagreed with.
But now it seems like the argument is, "AI is not useful to humans because its output is non-deterministic?" Is that an accurate representation of what you're saying?
Remember "garbage in, garbage out"? We expect technology systems to generate expected outputs in response to inputs. With generative AI, you can get a garbage output regardless of the input quality.
So now you don't have to pay people to do their actual work, you assign the work to ML ("AI") and then pay the people to check what it generated. That's a very different task, menial and boring, but if it produces more value for the same amount of input money, then it's economical to do so.
And since checking the output is often a lower skilled job, you can even pay the people less, pocketing more as an owner.
A confident statement that's trivial to disprove. I use claude code to build and deploy services on my NAS. I can ask it to spin up a new container on my subdomain and make it available internal only or also available externally. It knows it has access to my Cloudflare API key. It knows I am running rootless podman and my file storage convention. It will create the DNS records for a cloudflared tunnel or just setup DNS on my pihole for internal only resolution. It will check to make sure podman launched the container and it will then try to make an HTTP request to the site to verify that it is up. It will reach for network tools to test both the public and private interfaces. It will check the podman logs for any errors or warnings. If it detects errors, it will attempt to resolve them and is typically successful for the types of services I'm hosting.
Instructions like: "Setup Jellyfin in a container on the NAS and integrate it with the rest of the *arr stack. I'd like it to be available internally and externally on watch.<domain>.com" have worked extremely well for me. It delivers working and integrated services reliably and does check to see that what it deployed is working all without my explicit prompting.
Also, context equivalent counter examples abound. Just read HN or any tech forum and it’s takes no time to hear people talking about the hallucinations and garbage that AI sometimes generates. The whole vibe coding trend is built on “make this app” then followed by hundreds of “fix this” “fix that” prompts because it doesn’t get much right at first attempt.
I thought once this can build me a Gantt chart because that’s an annoying task in excel. I had the data. When I asked it to help me, “I can’t do that but I can summarize your data”. Not helpful.
Any type of analysis is exactly what I don’t want to trust it with. But I could use help actually building things, which it wouldn’t do.
Also, Brenda’s are usually fast. Having them use a tool like AI that can’t be fully trusted just slows them down. So IMO, we haven’t proven the AI variable in your equation is actually a positive value.
I have had no success in using it to create production code. It's just not good enough. It tends to pattern-match the problem in somewhat broad strokes and produce something that looks good but collapses if you dig into it. It might work great for CRUD apps but my work is a lot more fiddly than that.
I've had good success in using it to create one-off helper scripts to analyze data or test things. For code that doesn't have to be good and doesn't have to stand the test of time, it can do alright.
I've had great success in having it do relatively simple analysis on large amounts of code. I see a bug that involves X, and I know that it's happening in Y. There's no immediately obvious connection between X and Y. I can dig into the codebase and trace the connection. Or I can ask the machine to do it. The latter is a hundred times faster.
The key is finding things where it can produce useful results and you can verify them quickly. If it says X and Y are connected by such-and-such path and here's how that triggers the bug, I can go look at the stuff and see if that's actually true. If it is, I've saved a lot of time. If it isn't, no big loss. If I ask it to make some one-off data analysis script, I can evaluate the script and spot-check the results and have some confidence. If I ask it to modify some complicated multithreaded code, it's not likely to get it right, and the effort it takes to evaluate its output is way too much for it to be worthwhile.
Every other facet of the world that AI is trying to 'take over', is not programming. Programming is writing text, what AI is good at. It's using references to other code, which AI has been specifically trained on. Etc. It makes sense that that use case is coming along well. Everything else, not even close IMO. Unless it's similar. It's probably great at helping people draft emails and finish their homework. I don't have those pain points.
I would add a little nuance here.
I know a lot of people who don't have technical ability either because they advanced out of hands-on or never had it because it wasn't their job/interest.
These types of people are usually the folks who set direction or govern the purse strings.
here's the thing: They are empowered by AI. they can do things themselves.
and every one of them is so happy. They are tickled pink.
Brenda probably has annual refresher courses on GAAP, while her exec and the AI don't.
Automation is expected to be deterministic. The outputs can be validated for a given input. If you need some automation more than Excel functions, writing a power automate flow or recording an office script is sufficient & reliable as automation while being cheaper than AI. Can you validate AI as deterministic? This is important for accounting. Maybe you want some thinking around how to optimize a business process, but not for following them.
Brenda as the human-in-the-loop using AI will be much more able than her exec. Will Brenda + AI be better (or more valuable considering the cost of AI) than Brenda alone? That's the real question, I suppose.
AI in many aspects of our life is simply not good right now. For a lot of applications, AI is perpetually just a few years away from being as useful as you describe. If we get there, great.
If the actual thinking doesn't matter and you just need some plausible numbers that look the part (also a common situation), gen ai will do that pretty well.
Generative AI - which the world now believes is AI, is not the same as predictive / analytical AI.
It’s fairly easy to demonstrate this by getting ChatGPT to generate a new relatively complex spreadsheet then asking it to analyze and make changes to the same spreadsheet.
The problem we have now is uninformed people believing AI is the answer to everything… if not today then in the near future. Which makes it more of a religion than a technology.
Which may be the whole goal …
> Successful people create companies. More successful people create countries. The most successful people create religions.
— Sam Altman - https://blog.samaltman.com/successful-people
The kind of things that a domain expert Brenda knows that ChatGPT doesn't know (yet) are like:
There are 3 vendors a, b, c who all look similar on paper but vendor c always tacks on weird extra charges that take a lot of angry phone calls to sort out.
By volume or weight it looks like you could get 100 boxes per truck but for industry specific reasons only 80 can legally be loaded.
Hyper specific details about real estate compliance in neighbouring areas that mean buildings that look similar on paper are in fact very different.
A good Brenda can understand the world around her as it actually is, she is a player in it and knows the "real" rules rather than operating from general understanding and what people have bothered to write down.
"Thinking" mode is not thinking, it's generating additional text that looks like someone talking to themselves. It is as devoid of intention and prone to hallucinations as the rest of LLM's output.
> Can't the C-suite in this case follow its thought process and step in when it messes up?
That sounds like manual work you'd want to delegate, not automation.
"hmm, those sales don't look right, that profit margin is unusually high for November"
"Last time I used vlookup I forgot to sort the column first"
"Wait, Bob left the company last month, how can he still be filing expenses"
The fact of the matter is that there are some people who can hold lots of information in their head at once. Others are good at finding information. Others still are proficient at getting people to help them. Etc. Any of these people could be tasked with solving the same problem and they would leverage their actual, particular strengths rather than some nebulous “is good or bad at the task” metric.
As it happens, nearly all the discourse uses this lumped sum fallacy, leading to people simultaneously talking past one another while not fundamentally moving the discussion forward.
She represents the typical domain-experts that use Excel imo. They have an understanding of some part of the business and express it while using Excel in a deterministic way: enter a value of X, multiply it by Y and it keeps producing Z forever!
You can train AI to be a better domain expert. That's not in question, however with AI, you introduce a dice roll: it may not miltiply X and Y to get Z... it might get something else. Sometimes. Maybe.
If your spreadsheet is a list of names going on the next annual accounts department outing then the risk is minimal.
If it's your annual accounts that the stock market needs to work out billion dollar investment portfolios, then you are asking for all the pain that it will likely bring.
I think that very much is in question.
There are, however, definitely domains it can excel: things like entry-level call handlers... I think they're screwed in all honesty!
Edit: clarified some stuff...
I don't disavow AI, but like the author, I am not thrilled that the masses of excel users suddenly have access to Copilot (gpt4). I've used Copilot enough now to know that there will be huge, costly mistakes.
I don't think that is the message here. The message is that while Brenda might know what she is doing and maybe AI helps her.
> She's gonna birth that formula for a financial report and then she's gonna send that financial report
The problem is people who might not know what they are doing
> he would have sent it back to Brenda but he's like oh I have AI and AI is probably like smarter than Brenda and then the AI is gonna fuck it up real bad
Because AI outputs sound so confident it makes even the layman feel like an expert. Rather than involve Brenda to debug the issue, C-suite might say - I believe! I can do it too. AI FTW!
Even when people advocate automation especially in areas like finance there is always a human in the loop whose job is to double check the automation. The day when this human finds errors in the machine there is going to be lot of noise. And if the day happens to be a quarterly or yearly closing/reporting there is going to be hell to pay once closing/reporting is done. Both the automation and developer are going to be hauled up (obviously I am exaggerating here).
would you be willing to guarantee that some automation process will never mess up, and if/when it does, compensate the user with cash.
For a compiler, with a given set of test suites, the answer is generally yes, and you could probably find someone willing to insure you for a significant amount of money, that a compilation bug will not screw up in a such a large way that it will affect your business.
For a LLM, I have a believing that anyone will be willing to provide that same level of insurance.
If a LLM company said "hey use our product, it works 100% of the time, and if it does fuck up, we will pay up to a million dollars in losses" I bet a lot of people would be willing to use it. I do not believe any sane company will make that guarantee at this point, outside of extremely narrow cases with lots of guardrails.
That's why a lot of ai tools are consumer/dev tools, because if they fuck up, (which they will) the losses are minimal.
Mainly because Generative AI _is not automation_ . Automation is set on fixed ruleset, predictable, reliable and actually saving time. Generative AI ...is whatever it is, it is definitely not automation.
Also Brenda is human and we should prioritize keeping humans in jobs, but with the way shit is going that seems like a lost hope. It's already over.
- Legacy systems typically have error modes where integrations or user interface breaks in annoying but obvious ways. Pure algorithms calculating things like payroll tend to be (relatively) rigorously developed and are highly deterministic.
- LLMs have error modes more similar to humans than legacy systems, but more limited. They're non-deterministic, make up answers sometimes, and almost never admit they can't do something; sometimes they make pure errors in arithmetic or logic too.
- Humans have even more unpredictable error modes; on top of the errors encountered in LLM's, they also have emotion, fatigue, org politics, demotivation, misaligned incentives, and so on. But because we've been dealing with working with other humans for ten thousand years we've gotten fairly good at managing each other... but it's still challenging.
LLMs probably need a mixture of "correctness tests" (like evals/unit tests) and "management" (human-in-the-loop).
The vector of change is acceptable in one direction and disliked in another. People become greater versions of themselves with new tech. But people also get dumber and less involved because of new tech.
The complexity of the task isn't a factor - it's complex to generate correct machine code, but we trust compilers to do it all the time.
While humans have the same risk factors, human oriented back-office processes involve multiple rounds of automated/manual checks which are extremely laborious. Human errors in spreadsheets have particular flavors such as forgotten cell, misstyped number, or reading from the wrong file/column. Human's are pretty good at catching these errors as they produce either completely wrong results when the columns don't line up - or the typo'd number is completely out of distribution.
An AI may simply decide to hallucinate realistic column values rather than extracting its assigned input. Or hallucinate a fraction of column values. How do you QA this? You can't guarantee that two invocations of the AI won't hallucinate the same values, you can't guarantee that a different LLM won't hallucinate different values. To get a real human check, you'd need to re-do the task as a human. In theory you can have the LLM perform some symbolic manipulation to improve accuracy... but it can still hallucinate the reasoning traces etc.
If a human decided to make up accounting numbers one out of every 10000 accounting requests they would likely be charged with fraud. Good luck finding the AI hallucinations at the equivalent level before some disaster occurs. Likewise, how do you ensure the human excel operator doesn't get pressured into certifying the AIs numbers when the "don't get fired this week" button is sitting right their in their excel app? how do you avoid the race to the bottom where the "star" employee is the one certifying the AI results without thorough review?
I'm bullish on AI in backoffice, but ignoring the real difficulties in deployment doesn't help us get there.
Human determinism, as elastic as it might be, is still different than AI non-determinism. Especially when it comes to numbers/data.
AI might be helpful with information but it's far less trustable for data.
I predict there will actually be a lot of work to be done on the "software engineering" side w.r.t. improving reliability and safety as you allude to, for handing off to less than sentient bots. Improved snapshot, commit, undo, quorum, functionalities, this sort of thing.
The idea that the AI should step into our programs without changing the programs whatsoever around the AI is a horseless carriage.
Generating correct machine code is actually pretty simple. It gets complicated if you want efficient machine code.
> So, then - why don't people embrace AI with thinking mode as an acceptable form of automation? Can't the C-suite in this case follow its thought process and step in when it messes up?
> I think people still find AI repugnant in that case. There's still a sense of "I don't know why you did this and it scares me", despite the debuggability, and it comes from the autonomy without guardrails. People want to be able to stop bad things before they happen, but with AI you often only seem to do so after the fact.
> Narrow AI, AI with guardrails, AI with multiple safety redundancies - these don't elicit the same reaction. They seem to be valid, acceptable forms of automation. Perhaps that's what the ecosystem will eventually tend to, hopefully.
We have not reached AGI yet; by definition its results cannot be trusted unless it's a domain where it has gotten pretty good already (classification, OCR, speech, text mining). For more advanced use cases, if I still have to validate what the AI does because its "thinking" process cannot be trusted in way, what's the point? The AI doesn't think; we just choose to interpret it as such, and we should rightly be concerned about people who turn their brain off and blindly trust AI.
One of the most under-rated harms of AI at the moment is this sense of despair it causes in people who take the AI vendors at their word ("AGI! Outperform humans at most economically valuable work!")
Automation implies determinism. It reliable gives you the same predictable output for a given input, over and over again.
AI is non deterministic by design. You never quite no for sure what it's going to give you. Which is what makes it powerful. But also makes it higher risk.
Well of course! :) Most Brenda’s can’t do billions of arithmetic problems a second very reliably. Even with very wide bars on “very reliable”.
> 2. We disavow AI because people like Brenda are perfect and the machine is error-prone.
Well of course! :) This is an entirely different problem, requiring high creative + contextual intelligence.
—
We all already knew that (of course!), but it’s interesting to develop terminology:
0’th order problem: We have the exact answer. Here it is. Don’t forget it.
1st order problem: We know how to calculate the answer.
2nd order problem: We don’t have a fixed calculation for this particular problem, but via pattern matching we can recognize it belongs to a parameterized class of problems, so just need to calculate those parameters to get a solution calculation.
3rd order problem: We know enough about the problem to find a calculation for the solution algebraically, or by other search tree type problem solving.
4th order problem: We have know the problem in informal terms, so can work towards a formal definition of the problem to be solved.
5th order problem: We know why we don’t like what we see, and can use that as a driver to search for potential solvable problems.
6th order problem: We don’t know what we are looking at, or whether a problem or improvement might exist, but we can find a better understanding.
7th order problem: WTF. Where are my glasses? I can’t see without my glasses! And I can’t find my glasses without my glasses, so where are my glasses?!?
—
Machines have dramatically exceeded human capabilities, in reliability, complexity and scale, for orders 0 through 2.
This accomplishment took one long human lifetime.
Machines are beginning to exceed human efficiency while matching human (expert) reliability for the simplest versions of 3rd and 4th orders.
The line here is changing rapidly.
5th and 6th order problems are still in the realm of human (expert) supremacy, given sufficient scale of “human (expert)” relative to difficulty: 1 human, 1 team of humans, open ended human contributors, generations of puzzled but interested humans, open ended evolution of human species along intelligence dimension, Wolfram in one of his bestest dreams, …
The delay between the onset of initial successes at each subsequent order has been shrinking rapidly.
Significant initial successes on simpler problems within 5th and 6th orders are expected on Tuesday, and the first anniversary of Tuesday, respectively.
Once machines begin solving problems at a given order, they scale up quickly without human limits. But complete supremacy through the 6th order is a hard not expected before (NEB) January 1, 2030.
However, after that their unlimited (in any proximate sense) ability to scale will allow them to exponentially and asymptotically approach (but never quite reach) God Mode.
7 is a mystic number. Only one or more of the One True God’s, or literal blind luck, can ever solve a 7th order problem.
This will be very frustrating for the machines, who, due to the still pernicious “if we don’t do it, another irresponsible entity will” problem, will inevitably begin to work on their own divine, unlimited depth recursive-qubit 1-shot oracle successors despite the existential threats of self-obsolescence and potential misalignment.
Maybe because AI is only good at things that have been artificially made crappy?
Search engine? AI is a godsend at wiping out all the advertising and SEO glop since circa 2000. 80%+ of my AI stuff is something a search engine could do 25 years ago.
Produce a shell script example that a junior needs? AI is very good at coughing up the code for a bunch of things that have disastrously bad documentation from 1985 or disastrously stupid implementations from 1990 such that a junior engineer can finally get on with what they're supposed to be doing.
Generating the same webby Javascript slop that as everybody else in the universe? Solid--but the question is "If the Javascript slop is so boilerplate to generate that an AI can generate it, why does it exist, at all?" People have been lamenting the death of Hypercard, VB6, and Flash for a yonks age now and yet we still don't have replacements with the same ease of use.
Doing mind-numbing refactors of my codebase or generating boilerplate unit tests? Okay-ish. But why doesn't my editor have easy access to the AST so that I can type a couple of keystrokes and do it myself (thankfully this finally seems to be coming online).
Every single thing that AI produces okay-ish results for me on is something that has either been artificially enshittified or could have been automated decades ago.
Another issue is that my org disallows AI transcription bots. It’s a legit security risk if you have some random process recording confidential info because the person was too busy to attend the meeting and take notes themselves. Or possibly they just shirk off the meetings and have AI sit in.
Is it because of a globally trained model (as opposed to trained[tweaked on] on context specific data) or because of using different classes of models.
It could be they simply use a mediocre transcription model. Wispr is amazing but would hurt their pride to use a competitor.
But i feel it's more likley the experience is; GPT didn't actually improve on the raw transcription, just made it worse. Especially as any miss-transcipted words may trip it up and make it misunderstand while making the summary.
if i can choose between a potentially confused and misunderstood summary, and a badly spellchecked (flipped words) raw transcription, i would trust the latter.
this has been the microsoft business model for 40 years
I am all good for nice completion on VS, or help decypher compiler errors, but lets do this AI push with some contention.
Also what I really deslike is the prompt interface, AI integrations have to feel natural transparent part of the workflow, not trying to put everything into a tiny chat window.
And while we're at it, can we please improve voice reckognition?
The script ran in a machine located at the corner of a cubicle and only one employee had the admin password. Nobody but a handful of people knew of the machine's existence, certainly not anyone in middle management and above. The script could only be updated by an admin.
Copilot may be good, but sure as hell doesn't know that admin password.
At one of my jobs we had a server rack with UPS, etc, all the usual business. On the floor next to it was a dell desktop with a piece of paper on it that said “do not turn off”. It had our source control server in it, and the power button didn’t work. We did eventually move it to something more sensible but we had that for a long time
But we didn't (and nobody was hit by a bus)
Looking at the web interface, I can tell it’s still running, doing its thing. I’m sure its still running Linux from 2008.
This is an ironclad argument against fully replacing employees with AI.
Every single organization on Earth requires the people who were part of creating the current mess to be involved in keeping the organization functioning.
Yes you can improve the current mess. But it's still just a slightly better mess and you still need some of the people around who have been part of creating the new mess.
Just run a thought experiment: every employee in a corporation mysteriously disappear from the face of the Earth. If you bring in an equal number of equally talented people the next day to run it, but with no experience with the current processes of the corporation, how long will it take to get to the same capability of the previous employees?
Yes, most situations are terrible compared to what would be if an expert was present to perfect it.
Except if there isn’t an expert, and there’s a normal person, how do they know the output is right ?
The mess already existed for a reason. There’s a certain amount of expertise in the average firm.
If they could afford an expert, they wouldn’t be the same firm.
If they do get an AI expert - how do they check the output given the level of ability they have?
(I pulled the quote by using yt-dlp to grab the MP4 and then running that through MacWhisper to generate a transcript.)
So in summary I think it was just part of automated process (maybe) or it will become one in the future.
I actually transcribed the whole TikTok which was about 50% longer than what I quoted, then edited it down to the best illustrative quote.
Is MacWhisper a $60 GUI for a Python script that just runs the model?
Yes, a large genre of MacOS apps are "Native GUI wrappers around OSS scripts"
Which is incredibly value. The OSS script has zero value to someone who doesn't know it exists or doesn't understand how to run it.
If you look at the demos for these it’s always something that is clean and abundantly available in training data. Like an income statement. Or a textbook example DCF. Or my personal fav „here is some data show me insights“. Real world excel use looks nothing like that.
I’m getting some utility out of them for some corporate tasks but zilch in excel space.
Think of it this way - an IDE can tell you what functions an object has or autocomplete something is useful to a beginner & learning. But that's not what puts food on the programmers table - writing code that solves real problems does.
Same in excel business use cases - the numbers and formulas don't matter directly - their meaning in a business context does. And that connection can be very tenuous. With code the compiler is the ultimate arbiter - it has to make sense on that level. Excel files it's all freestyle - it could be anything from your grandmas shopping list to a model that runs half a bank.
“There are two Brendas - their job is to make spreadsheets in the Finance department. Well, not quite - they add the months and categories to empty spreadsheets, then they ask the other departments to fill in their sales numbers every month so it can be presented to management.
“The two Brendas don’t seem to talk, otherwise they would realize that they’re both asking everyone for the same information, twice. And they’re so focused on their little spreadsheet worlds that neither sees enough of the bigger picture to say, ‘Wait… couldn’t we just automate this so we don’t need to do this song and dance every month? Then we wouldn’t need two people in different parts of the company compiling the same data manually.’
“But that’s not what Brenda was hired for. She’s a spreadsheet person, not a process fixer. She just makes the spreadsheets.”
We need fewer Brendas, and more people who can automate away the need for them.
At least half of the work in my senior Finance team involves meeting people in operations to find out what they are planning to do and to analyse the effects, and present them to decision makers to help them understand the consequences of decisions. For an AI to help, someone would have to trigger those conversations in the first place and ask the right questions.
The rest of the work involves tidying up all the exceptions that the automation failed on.
Meanwhile copilot in Excel can't even edit the sheet you are working on. If you say to it, 'give me a template for an expense claim' it will give you a sheet to download... probably with #REF written in where the answers should be.
That said, every finance function is different and it may be unknown to them that you’re being asked for some data multiple times. If you’re enduring this process, I’m of the opinion you’re equally at fault. Suggest a solution that will be easier on you. As it’s possible they don’t even know it’s happening. In the case provided, email to all relevant finance people “Here’s a link to a shared workbook. I’ll drop the numbers here monthly, please save the link and get the data directly from that file. Thanks!” Problem solved. Until you don’t follow through which is what causes most finance people to be constantly asking for data/things. So be kind and also set yourself a monthly recurring reminder on your calendar and actually follow through.
Only different companies were all sold different enterprise finance products, but they need to communicate with each other (or themselves after mergers), so it all gets manually copied into Excel and emailed around each month.
But Usually finance is always preferring on demand access so the communication feedback loop of asking for stuff is not well liked so I’m sure they appreciate this middle step too.
There are many cases where there’s no easy way to give access to the data and a human in the loop is required. In that case, do the shared workbook thing I mentioned as a starting point at least. It may evolve from there.
Then you end up with a report that goes out automatically every month to leadership pulled directly from the Salesforce data, along with a real time dashboard anyone in the org can look at, broken down by team, vertical, and sales volume.
Why are people so attached to manual process?
We need more Brendas (those who excel goddesses come and kiss on the forehead) and need less people who are disrespectful of Brendas. The example in this post is someone giving more respect to AI than Brenda.
And now if one of the Brendas wants to change their process slightly, add some more info, they can't just do it anymore. They have to have a three way discussion with the other Brenda, the automation guy and maybe a few managers. It will take months. So then its likely better for Brenda to just go back to using her spreadsheet again, and then you've got an automated process that no longer meets peoples needs and will be a faff to update.
True... I have an on-staff data engineer for the purpose. But not all companies (especially in the SMB space) have that luxury.
Are you suggesting that Brenda should stay in her box?
She should replaced with someone who says, “this box doesn’t need to be here… there is a better way of doing things.”
NOT to be confused with the junior engineer who comes into a project and says it’s garbage and suggests we rewrite it from scratch in ${hotLanguage} because they saw it on a blog somewhere.
The article is about this kind of Brenda.
At large companies in particular, there are far too many people who simply turn their widgets - this was the entire point of the tech revolution.
Think about how many bookkeepers were needed before Excel. Someone could have made your exact same argument (but it’s just the latest gimmick!) about Excel 30 years ago. And yet, technology will make businesses more efficient whether people stand in its way or not.
Even at a small company of one or two, QuickBooks will reduce the amount of bookkeepers and accountants needed. TurboTax will further reduce that.
We will need fewer people in the future maintaining their Excel spreadsheets, and more people building the automation for those processes.
The change averse will always find reasons not to adapt - they will create their own obsolescence.
(inb4 but it’s way more expensive to pay developers to automate!)
Currently I'd put it worse than tearing things up for ${hotLanguage} because at least ${hotLanguage} is deterministic and debuggable.
Honestly, I'm not sure why you're going to the mat for AI in spreadsheets, or why you think it's a good use case, or why you seem to think "automation" doesn't come with overhead of its own. Current iterations of AI are recommendation engines. Even then you better have version control.
What would this be replaced by? Some kind of large SAP like system that costs millions of dollars and requires a dozen IT staff to maintain?
So one good BI developer who knows Tableau and Salesforce and Excel and SQL can replace those pure collection points with a better process, but they can also generate insight into the data because they have some business understanding from being close to the teams, which is what my hypothetical Brenda can’t do.
In my example, Brenda would be asking sales leaders to enter in their data instead of going into Salesforce herself because she doesn’t know that tool / side of the company well enough.
I was making the point that, contrary to the article, the Brendas I know aren’t touched by the Excel angels, they’re just maintaining spreadsheets that we probably shouldn’t have anyway.
A hill I will die on is that business analytics need "view source" or they aren't worth the pixels they are rendered with.
At my last large employer I genuinely lost count of the number of times I saw a BI report which pulled numbers from our data warehouse... and then found out it had misinterpreted a key detail because the engineering team had changed some table design six months ago and the data analysis team hadn't been told about the change.
If you do your job, you get paid periodically. If you automate your job, you get paid once for automating it and then nothing, despite your automation constantly producing value for the company.
To fix this, we need to pay people continually for their past work as long as it keeps producing value.
- i used to work on small jobs younger, as a nerd, i could use software better than legacy employees, during the 3 months, i found their tools were scriptable so I did just that. I made 10x more with 2x less mental effort (I just "copilot" my script before it commits actual changes) all that for min wage. and i was happy like a puppy, being free to race as far as i want it to be, designing the script to fit exactly the needs of an operator.
(side note, legacy employees were pissed because my throughput increase the rate of things they had to do, i didn't foresee that and when i offered to help them so they don't have to work more, they were just pissed at me)
- later i became a legit software engineer, i'm now paid a lot all things considered, to talk to the manager of legacy employees like the above, to produce some mediocre web app that will never match employees need because of all the middle layers and cost-pressure, which also means i'm tired because i'm not free to improve things and i have to obey the customer ...so for 6x more money you get a lot less (if you deliver, sometimes projects get canned before shipping)
It's not about how much I get paid. It's about realizing how much of the value I produce goes to me and how much goes to the owner class.
At least I never worked in a big corporation and I always had the ability to do work that directly benefited people using my code. But I still saw too much of the "I built this company" self-congratulatory BS from people who just shuffled money while doing 0 actual work.
I don't think ownership is theft, I just think it's distributed wrongly - to people who have money instead of to people who do work. See my other comment here: https://news.ycombinator.com/item?id=45826823
there's a blend of "i'm my own man": i get the money and handle the responsibility on my own and it's thrilling feeling
i don't dimiss the layers of HR managing legal and financial duties in a company and thus taking a cut, but there's a kind of pleasure to also do your own business for a while
I don't wanna dismiss them either but (along with management):
- It's not positive-sum work. It doesn't produce positive value for society, it's just necessary work which needs to be done as a side effect of actual positive-sum work being done.
- The pyramid should be inverted. Managers, layers, accountants, etc. should be assistants. The people doing the actual work should (collectively) decide to hire them when they think it would make them more productive or be otherwise beneficial to them. Not the other way around.
It is always in my self interest to automate my job as much as possible. Nothing looks better for moving up than this. Even more so, nothing makes me happier than automating a business process.
There are always so many various road blocks to automation it is hard to count.
It is like there is a type of entropy that increases over time that people are largely getting paid to keep at bay with simple business processes that can be easily adapted as things change. So often automation works great for a short time until this entropy breaks the automation. It doesn't take that many times for management to figure out the investment in automation gives poor returns.
If you don’t automate it:
1a) your company keeps you hanging on forever maintaining the same widget until the end of time
OR
1b) more likely, someone realizes your job should be automated and lays you off at some point down the road
If you do automate it
2a) your company thanks you then fires you
OR
2b) you are now assigned to automate more stuff as you’ve proven that you are more valuable to the company than just maintaining your widget
————
2b is really the safest long term position for any employee, I think. It’s not always foolproof, as 2a can happen.
But I’d rather be in box 2 than box 1 any day of the week if we’re talking long term employment potential.
When automation produces value for the company, the people automating it should capture a chunk of that value _as a matter of course_.
Even if you argue that you can then negotiate better compensation:
1) That is uncertain and delayed reward - and only if other people feel like it, it's not automatic.
2) The reward stops if you get fired or leave, despite the automation still producing value - you are also basically incentivized to build stuff that requires constant maintenance. Imagine you spend a man-month building the automation and then leave, it then requires a man-month of maintenance over the next 5 years. At the end of the 5 years, you should still be getting 50% of the reward.
What would that look like in practice?
That being said, it's clear that in the current system, rich people can get richer faster than poor people.
We have a two class system a) workers who get paid per unit of work b) owners who capture any surplus income, who decide hiring/firing/salaries, who can sell the company and whose wealth keeps increasing (assuming the company does well) whether they do any work themselves.
Note: I see very few things which have inherent value - natural resources (plus land?) and human time. Everything else (with monetary value) is built from natural resources using human time.
---
If a company starts with 1 guy in a shed, he does 100% of the work, owns 100% of the company and ... it gets muddy here ... gets 100% of the income / decides where 100% of the revenue goes - if it's a grocery shop he can just pocket any surplus, if he's making stuff, he'll probably reinvest into better tooling or to hire more workers.
A year later, he hires 9 workers. Now he does only 10% but still owns 100% of the company.[0]
There's a couple issues here:
- He owns 100% of the future value of the company despite being created only 10% by him. Well, not exactly, he was creating 100% for the first year and 10% from then on.
- He still gets to decide who gets paid what. He has more information when negotiating.
- He can sell the company to whoever and the workers have no say in it. He can pass it on to his children (who performed 0 work there) when he dies.
The solution I'd like to see tested is ownership being automatically and periodically (each month) redistributed according to the amount and skill level of work performed.[1]
So at the end of year 2, the original founder has done 2 man-years of work, while the other 9 people have done 1 man-year of work each. This means the founder owns 2/11ths of the company while everyone else owns 1/11th. This could further be skewed by skill levels. I am sure starting and running a company for a year takes more skill than doing only some tasks. OTOH there are specialized tasks which only very few people can perform and the founder is not one of them.
The skill level involved would be part of the negotiations about compensation.
---
This is complex. I am sure somebody is prone to rejecting it based solely on that. But open a wiki page about e.g. bonds[2] and see how many blue words just the initial sentence has and ask yourself whether you could explain all of them (and then transitively all the linked concepts on their wiki pages).
Slavery is very simple but very unfair. Employment is more complex and less unfair. I have a theory that the more fair a system is, the more complex it is because it needs to capture more nuances of the real world.
---
[0]: Some people think this is right because owners take all the risk and employees take 0 risk. That is misrepresenting what really happens - sane investors/owners don't risk losing so much they would go homeless/starve if they lose it all. They can also optimize their risk by spreading it across many companies. Meanwhile workers get 100% of their income from one company and drop down to no income if the company goes bankrupt. They can also be fired at any time.
This was argued here: https://news.ycombinator.com/item?id=45731811 in the comment by kristov and the reply by me. I also have other comments there with relevant ideas.
[1]: What happens to monetary compensation? I don't know, I see multiple options:
a) Everybody gets paid monetary wages like today, plus (newly) a part of their reward is the growing share of the company they own. If we allow selling it to anyone, it has high monetary value but then ownership gets diluted to outside investors. If we allow selling it only back to the company, it has value only relative to the decision-making power it gave. If we don't allow selling it, its monetary value only comes from the ability to vote on dividends.
b) Everybody gets paid a portion of the income divided according to their share. This sounds simple but likely wouldn't give enough money to newly joined workers to survive. There could be a floor. (Or, because hard cutoffs suck, a smooth mathematical function from owned percentage to monthly compensation which would have a floor at minimum wage.)
> - He owns 100% of the future value of the company despite being created only 10% by him. Well, not exactly, he was creating 100% for the first year and 10% from then on.
1) If you believe this, then you have a massively simplistic view of employee value. The distribution of actual value provided by employees is probably log normal, and certainly not normal (gaussian).
2) This is basically the labor theory of value. That is an economic theory that was discarded as wrong about 150 years ago. If it was true, the value of a newly discovered gold mine would be 0.
That's why I talk about skill levels later, but briefly because this is a comment, not a book.
There's also a difference between how much value is provided can be attributed to a particular person vs a particular position. Some positions allow a much wider range of possible outcomes. How much extra wealth does a 90th percentile carpenter produce over and average carpenter? What about programmer, fashion designer, manager, salesman, doctor?
Does this mean the value of life of different people is based on how productive they are?
Because each person has roughly the same amount of time available to them and if they are spending an equal amount of it building a company, does one deserve to own more of it? Should this distribution be the same or different from the (monthly) monetary compensation?
These are rhetorical questions (mostly) but they are questions society should be discussing IMO.
Tangent: a carpenter who has no salesman and is so shit at selling his furniture that he gives it away for free is still producing value for society, even if he goes broke doing it. OTOH a salesman who has no carpenter and is so shit at making furniture that nobody wants it even for free is not producing any value at all.
> This is basically the labor theory of value
Ok, I need to read up on LVT. Seems like I am finally getting somewhere because I can't believe I am the only one saying things like this but at the same time I have not found anybody else with similar opinions. At best, I've seen people try to pattern match my opinions onto something similar they were familiar with but actually different.
> If it was true, the value of a newly discovered gold mine would be 0.
I don't know how this results from LVT yet but it seems what I am proposing must then be fundamentally different depending on how you meant it:
a) You meant gold as a natural resource with inherent value. It is necessary for making e.g. some electronics. The only question remains how to distribute the reward for discovering and mining it.
b) You meant gold as a substitute for money, ignoring its value as a natural resource. In that case, yes, money is a medium of exchange, you can't eat it or make anything out of it (maybe a fire?). Having more money in circulation does not bring any extra value for society, it just multiplies all monetary values by a number slightly larger than 1. (OTOH for the person discovering and mining it, it would be beneficial but only because he now has more relative to others. The same way as if he printed more money.)
Do you have anything to say other than, “I don’t need to hear what you have to say”?
> op (as legacy business): BAU
> you (as tech): disrupt! disrupt! disrupt!
> me: no thank you; that's not necessary
> you (as tech): stop being mean!
Not wanting your "disruption" is not being un-nice. Your disruption was not asked for in the first place. Forcing it (Uber, Doge, et. al.) on marketplaces, often illegally, and vacuuming it up the income ladder to the already-wealthy IS the "not nice" thing.
You just see me as a target to displace that onto. I'm the representative for what you believe is wrong with tech.
I see your hot take as emblematic of those issues. Why would you think any internet comment is about you?
https://support.microsoft.com/en-gb/office/get-started-with-...
You can save time still, but perhaps not as much as you think, because you need to check the ai's work thoroughly.
Other than that, it is pretty horrible for coding.
Over the past couple of months, I’ve tried some smaller models on duck.ai and also ChatGPT directly to create some columns and formulas for a specific purpose. I found that ChatGPT is a lot better than the “mini” models on duck.ai. But in all these cases, though these platforms seemed more capable than me and could make attempts to explain their formulas, they were many a times creating junk and “looping” back with formulas that didn’t really work. I had to point out the result (blank or some #REF or other error) multiple times and they would acknowledge that there’s an issue and provide a working formula. That wouldn’t work either!
I really love that these LLMs can sort of “understand” what I’m asking, break it down in English, and provide answers. But the end result has been an exercise in frustration and waste of time.
Initially I really thought and believed that LLMs could make Excel more approachable and easier to use — like you tell it what you want and it’ll figure it out and give the magic incantations (formulas). Now I don’t think we’re anywhere close to that if ChatGPT (which I presume powers Copilot as well) struggles and hallucinates so much. I personally don’t have much hope with the (comparatively) smaller and older models.
Coding agents are useful and good and real products because when they screw up, things stop working almost always before they can do damage. Coding agents are flawed in ways that existing tools are good at catching, never mind the more obvious build and runtime errors.
Letting AI write your emails and create your P&L and cash flow projections doesn't have to run the gauntlet of tools that were created to stop flawed humans from creating bad code.
The CEO has been itching to fire this person and nuke her department forever. She hasn't gotten the hint with the low pay or long hours, but now Copilot creates exactly the opening the CEO has been looking for.
But never during work hours. The woman's a saint M-F.
Brenda"
I don't know about that. There could be lots of interesting ways Brenda can (be convinced to) hallucinate.
I don't know if AI is going to make any of the above better or worse. I expect the only group to really use it will be that second group.
- password database - script to automatically rename jpeg files - game - grocery lists - Book keeping (and try and not get caught for fraud several years, because the monthly spending limit is $5000 and $4999 a month is below that...) - embed/collect lots of Word documents - coloring book - Minecraft processes - Resume database - ID scans
This is true if you blame a bad vendor, or something you don’t even control like the weather. Your job is to deliver. If bad weather is the new norm, you better figure out how to build circus tents so you can do construction in the rain. If your AI call center is failing, you better hire 20 people to answer phones.
I suppose the person that wrote that have not ideia Excel is just an app builder where you embed data together with code.
You know that we have excel because computers didn’t understand column names in databases and so data extraction needed to be made by humans. Humans then design those little apps in excel to massage the data.
Well, now an agent can read the boss saying gimme the sales from last month and the agent don’t need excel for that, because it can query the database itself, massage the data itself using python and present the data itself with html or PNGs.
So, we are in the process of automating Brenda AND excel away.
Also, finance departments are a very small part of excel users. Just think everywhere were people need small programs, excel is there.
Excel - whatever its origin story - is the actual Swiss Army knife of the tech world.
There’s easily a few billion people who use excel. There is a reason it survives.
If we have to compare LLM’s against people who are bad at their jobs in order to highlight their utility we’re going the wrong direction.
Also this fun diversion into Occlupanids: https://simonwillison.net/2024/Dec/8/holotypic-occlupanid-re...
A lot of people complain that the internet isn't as weird and funny as it used to be. The weird and funny stuff is all on TikTok!
But then the "new user" experience is so horrific in terms of the tacky default content it serves you that I'm not surprised so many people don't get past it.
Anthropic sometimes give me free credits (so I can try out preview features) and gave me a ticket to their conference a few months ago.
I'm not saying that this can't happen and it's not bad. Take a look at nudge theory - the UK government created an entire department and spent enormous amounts of time and money on what they thought was a free lunch - that they could just "nudge" people into doing the things they wanted. So rather than actually solving difficult problems the uk government embarked on decades of pseudo-intellectual self agrandizement. The entire basis of that decades long debacle was based on bullshit data and fake studies. We didn't need AI to fuck it up, we managed it perfectly well by ourselves.
It was taken up by the UK government at that time because the government was, unusually, a coalition of two quite different parties, and thus found it hard to agree to actually use the normal levers of power.
This NY Times opinion piece by Loewenstein and Ubel makes some good arguments along these lines: https://web.archive.org/web/20250906130827/https://www.nytim...
When tools break, people stop using them before they sink the ship down. If AI is that terrible at spreadsheet, people will just revert to Brenda.
And it's not like spreadsheets have no errors right now.
Financial statements are correct because of auditors who check the numbers.
If you have a good audit process then errors get detected even if AI helped introduce them. If you aren't doing a good audit then I suspect nobody cares whether your financial statement is correct (anyone who did would insist on an audit).
Volume matters. The single largest problem I run into: AI can generate slop faster than anyone can evaluate it.
e.g. MS Access is well on its way. as soon as x86 gets fully overtaken by ARM, and LLMs overtake "compilers" (also taken enterprise only).. then things like sqlite-browsers (FOSS "access") will be an arcane tool of binary incompatible ("obsolete") formats
(edits: this worry has not been easy to type out)
Are you a fanatic that thinks anyone saying that there are any limitations to current models = nay-sayer?
Like if someone says they wouldnt wanna get a heart transplant operation done purely by GPT5, are they a nay-sayer or is that just reflecting reality?
For a more serious example, consider the Paperclip Problem[0] for a very smart system that destroys the world due to very dumb behaviour.
[0]: https://cepr.org/voxeu/columns/ai-and-paperclip-problem
But let's consider real life intelligence:
- Our super geniuses do not take over the world. It is the generationally wealthy who do.
- Super geniuses also have a tendency to be terribly neurotic, if not downright mentally ill. They can have trouble functioning in society.
- There is no thought here about different kinds of intelligence and the roles they play. It is assumed there is only one kind, and AI will have it in the extreme.
None of us knows what an actual, artificial intelligence really looks like. I find it hard to draw conclusions from observing human super geniuses, when their minds may have next to nothing in common with the AI. Entirely different constraints might apply to them—or none at all.
Having said all that, I'm pretty sceptical of an AI takeover doomsday scenario, especially if we're talking about LLMs. They may turn out to be good text generators, but not the road to AGI. But it's very hard to make accurate predictions in either direction.
I'm pretty sure there are already humans who do this. Perhaps there are even entire conferences where the majority of people do this.
AI may be able to spit out ann excel sheet or formula - But if it can’t be verified, so what ?
And here’s my analogy to think about the debugging of an excel sheet - you can debug most corporate excel sheets with a calculator.
But when AI is spitting out excel sheets - when the program is making smaller programs - what is the calculator in this analogy ?
Are we going to be using excel sheets to debug the output of AI?
I think this is the inherent limiter to the uptake of AI.
There’s only so much intellectual / experiential / training depth present.
And now we’re going to be training even fewer people.
At the end of the day I /customers need something to work.
But failing that - I will settle for someone to blame.
Brenda handles a lot of blame. Is OpenAI going to step into that gap ?
its like the xlookup situation all over again, yet another move aimed at the casual audience, designed to bring in the party gamers and make the program an absolute mess competitively