I think AI is just allowing everyone to speed-run the innovator's dilemma. Anyone can create a small version of anything, while big orgs will struggle to move quickly as before.
The interesting bit is going to be whether we see AI being used in maturing those small systems into big complex ones that account for the edge cases, meet all the requirements, scale as needed, etc. That's hard for humans to do, and particularly while still moving. I've not see any of this from AI yet outside of either a) very directed small changes to large complex systems, or b) plugins/extensions/etc along a well define set of rails.
But last week I had two days where I had no real work to do, so I created cli tools to help with organisation, and cleaning up, I think AI boosted my productivity at least 200%, if not 500.
That’s what I’ve been doing lately, and it really helps get a clean architecture at the end.
On the other you have a non-technical executive who's got his head round Claude Code and can run e.g. Python locally.
I helped one recently almost one-shot converting a 30 sheet mind numbingly complicated Excel financial model to Python with Claude Code.
Once the model is in Python, you effectively have a data science team in your pocket with Claude Code. You can easily run Monte Carlo simulations, pull external data sources as inputs, build web dashboards and have Claude Code work with you to really integrate weaknesses in your model (or business). It's a pretty magical experience watching someone realise they have so much power at their fingertips, without having to grind away for hours/days in Excel.
almost makes me physically sick.I've a reasonably intense math background corrupted by application to geophysics and implementing real world numerical applications.
To be fair, this statement alone:
* 30 sheet mind numbingly complicated Excel financial model
makes my skin crawl and invokes a flight reflex.
Still, I'll concede that a Claude Code conversion to Python of a 30 sheet Excel financial model is unlikely to be significantly worse than the original.
If a data science team modeled something incorrectly in their simulation, who's gonna catch it? Usually nobody. At least not until it's too late. Will you say "this doesn't look plausible" about the output? Or maybe you'll be too worried about getting chided for "not being data driven" enough.
If an exec tells an intern or temp to vibecode that thing instead, then you definitely won't have any checkpoints in the process to make sure the human-language prompt describing process was properly turned into the right simulation. But unlike in coding, you don't have a user-facing product that someone can click around in, or send requests to, and verify. Is there a test suite for the giant excel doc? I'm assuming no, maybe I'm wrong.
It feels like it's going to be very hard for anyone working in areas with less black-and-white verifiability or correctness like that sort of financial modeling.
Any and I mean any statistic someone throws at me I will try and dig in. And if I'm able to, I will usually find that something is very wrong somewhere. As in, the underlying data is usually just wrong, invalidating the whole thing or the data is reasonably sound but the person doing the analysis is making incorrect assumptions about parts of the data and then drawing incorrect conclusions.
Can't tell you how many times I've seen product managers making decisions based on a few hundred analytics events, trying to glean insight where there is none.
There are often more errors. Sometimes the actual results are wildly different in reality to what a model expects .. but the data treatment has been bug hunted until it does what was expected .. and then attention fades away.
The Excel sheet will have been tuned over the years by people who knew exactly what it was doing and fixed countless bugs along the way.
The Claude Code copy will be a simulacrum that may behave the same way with some inputs, but is likely to get many of edge cases wrong, and, when you're talking about 30 sheets of Excel, there will be many, many of these sharp edges.
IMHO, earned through years of bleeding eyeballs, the first will be riddled with subtle edge cases curiously patched and fettled such that it'll limp through to the desired goal .. mostly.
The automated AI assisted transcoding will be ... interesting.
I'm sure Claude Code will happily one-shot that conversion. It's also virtually guaranteed to have messed up vital parts of the original logic in the process.
Anyway, please try it if you find it unbelievable. I didn't expect it to work FWIW like it did. Opus 4.5 is pretty amazing at long running tasks like this.
Maybe you did one or the other , but “nearly one-shotted” doesn’t tend to mean that.
Claude Code more than occasionally likes to make weird assumptions, and it’s well known that it hallucinates quite a bit more near the context length, and that compaction only partially helps this issue.
I have no idea why it had so much trouble with this generally easy task. Bizarre.
"1 or 2 plan mode prompts" to fully describe a 30-sheet complicated doc suggests a massively higher level of granularity than Opus initial plans on existing codebases give me or a less-than-expected level of Excel craziness.
And the tooling harnesses have been telling the models to add testing to things they make for months now, so why's that impressive or suprising?
I was impressed because the prompt didn't ask it to do that. It doesn't normally add tests for me without asking, YMMV.
Did it build a test suite for the Excel side? A fuzzer or such?
It's the cross-concern interactions that still get me.
80% of what I think about these days when writing software is how to test more exhaustively without build times being absolute shit (and not necessarily actually being exhaustive anyway).
When shit hits the fan and execs need answers yesterday, will they jump to using the LLM to probabilistically make modifications to the system, or will they admit it was a mistake and pull Excel back up to deterministically make modifications the way they know how?
the largest independent derivatives broker in australia collapsed after it was discovered the board were using astrology and magicians to gamble with all the clients money
https://www.abc.net.au/news/2016-09-16/stockbroker-used-psyc...
It used to be that we'd fix the copy-paste bugs in the excel sheet when we converted it to a proper model, good to know that we'll now preserve them forever.
It is a beautiful experience to realize wtf you don’t know and how far over their skis so many will get trusting AI. The idea of deploying a rust project at my level of ability with an AI at the helm is is terrifying.
In my experience a lot of Excel models aren’t really tested, just checked a bit and them deemed correct.
“The real leaps are being made organically by employees, not from a top down [desktop PC] strategy. Where I see the real productivity gains are small teams deciding to try and build a [Lotus 123] assisted workflow for a process, and as they are the ones that know that process inside out they can get very good results - unlike a [mainframe] software engineering team who have absolutely zero experience doing the process that they are helping automate.”
The embedded “power users” show the way, then the CIO-friendly packaged software follows much later.
Microsoft has spent 30 years designing the most contrived XML-based format for Excel/Word/Powerpoint documents, so that it cannot be parsed except by very complicated bespoke applications with hundreds of developers involved.
Now, it's impossible to export any of those documents into plain text that an LLM can understand, and Microsoft Copilot literally doesn't work no matter how much money they throw at it. My company is now migrating Word documents to Markdown because they're seeing how powerful AI is.
This is karmic justice imo.
I even tried telling Copilot to convert each sheet to a CSV on one attempt THEN do calculations. It just ignored it and failed miserably, ironically outputting me a list of files that it should have made, along with the broken python script. I found this very amusing.
It seems way too soon to really narrow down any kind of trends after a few months. Most people aren't breathlessly following the next twitter trend, give it at least a year. Nobody is really going to be left behind if they pick up agents now instead of 3 months ago.
It took a lot of convincing, but I finally got her to start using ChatGPT to help her write SQL and walk her through setting up some SaaS accounting software formulas.
It worked so well now she's trying to find more applications at work. Claude code is too scary for her though. That will need to be in some Web UI before she feels comfortable giving it a try.
One tidbit I’d disagree with is that only those using the bleeding edge AI tools are reaping the benefits. There seem to be a lot of highly specialized tools and a lot of specific configurations (and mystical incantations) to get them to work, and those are constantly changing and being updated. The bleeding edge is a dangerous place to be if you value your time (and sanity).
Personally, as someone working on moderate-to-highly complex software (live inference of industrial IoT data), I can’t really open a merge / pull request for my colleagues to review unless I 100% understand what I’ve pushed, and can explain to them as well.
My killer app for AI would just be a CLI that gets me to a commit based on moderately technical input:
“Add this configuration variable for this entry point; split this class into two classes, one for each of the responsibilities that are currently crammed together; update the unit tests to reflect these changes, including splitting the tests for the old class into two different test classes; etc”
But, all the hype of the bleeding edge is around abstracting away the entire coding process until you don’t even understand what code is being generated? Hard to see it as anything but a pipe dream. AI is useful, but it’s not a panacea - you can’t fire it and replace it when it fucks up.
Granted I'm way behind the curve, but is this not how actual engineers (and not influencers) are using it? I heavily micro-manage the implementation because my manager still expects me to know the code
Seems like Nadella is having his Baller moment
Slightly overstated. Tiny teams aren't outcompeting because of AI, they're outcompeting because they aren't bogged down by decades of technical debt and bureaucracy. At Amazon, it will take you months of design, approvals, and implementation to ship a small feature. A one-man startup can just ship it. There is still a real question that has to be answered: how do you safely let your company ship AI-generated code at scale without causing catastrophic failures? Nobody has solved this yet.
I guess it's like asking for people's vim configs, but hey, there are at least a few popular posts mainly around git/vim/terminal configs.
I think it sums up how thoroughly they've been disrupted, at least for coding AIs (independent of like-for-like quality concerns rightly mentioned elsewhere in this thread re: Excel/Python).
I understand ChatGPT can do like a million other things, but so can Claude. Microsoft deliberately using competitors internally is the thing that their customers should pay attention to. Time to transform "Nobody gets fired for buying Microsoft" into "Nobody gets fired for buying what Microsoft buy", for those inclined.
I’m happily vibe coding at work but yeah article is right. MS has enterprise market share by default not by merit. Stunning contrast between what’s possible and what’s happening in big corp
And if the copilot button does nothing but open a chat window without any real integration with the app, what the hell is the point of that when there's already a copilot button in the windows taskbar?
I can select exactly where I want changes and have targeted element removal in Photoshop. If I submit the image and try to describe my desired changes textually, I get less easily-controllable output. (And I might still get scrambled text, for instance, in parts of the image that it didn't even need to touch.)
I think this sort of task-specific specialization will have a long future, hard to imagine pure-text once again being the dominant information transfer method for 90% of the things we do with computers after 40 years of building specialized non-text interfaces.
I was a bit surprised by how it still resulted in gibberish text on posters in the background in an unaffected part of the image that at first glance didn't change at all. So even just the "masking" ability of like "anything outside of this range should not be touched" of a GUI would be a godsend.
Ive been trying to create a quick and dirty marketing promo via an LLM to visualise how a product will fit into the world of people - it is incredibly painful to 'hope and pray' that by refining the prompt via text you can make slight adjustments come through.
The models are good enough if you are half-decent at prompting and have some patience. But given the amount invested, I would argue they are pretty disappointing. Ive had to chunk the marketing promo into almost a frame-by-frame play to make it somewhat work.