Let's be honest, Generative AI isn't going all that well(garymarcus.substack.com)

232 pointsby 7777777phil25 days ago43 comments

mattmaroon25 days ago
Meanwhile, my cofounder is rewriting code we spent millions of salary on in the past by himself in a few weeks.
I myself am saving a small fortune on design and photography and getting better results while doing it.
If this is not all that well I can’t wait until we get to mediocre!
- merlincorey25 days ago
  > Meanwhile, my cofounder is rewriting code we spent millions of salary on in the past by himself in a few weeks.
  Code is not an asset it's a liability, and code that no one has reviewed is even more of a liability.
  However, in the end, execution is all that matters so if you and your cofounder are able to execute successfully with mountains of generated code then it doesn't matter what assets and liabilities you hold in the short term.
  The long term is a lot harder to predict in any case.
  - _vertigo25 days ago
    > Code is not an asset it's a liability, and code that no one has reviewed is even more of a liability.
    Code that solves problems and makes you money is by definition an asset. Whether or not the code in question does those things remains to be seen, but code is not strictly a liability or else no one would write it.
    merlincorey25 days ago
    "Code is a liability. What the code does for you is an asset." as quoted from https://wiki.c2.com/?SoftwareAsLiability with Last edit December 17, 2013.
    This discussion and distinction used to be well known, but I'm happy to help some people become "one of today's lucky 10,000" as quoted from https://xkcd.com/1053/ because it is indeed much more interesting than the alternative approach.
    jvanderbot24 days ago
    Code requires maintenance, which grows with codebase size, minus some decay over time. (LLMs do not change this, and might actually be more sensitive to this), So increasing code size, esp with new code, implies future costs, which meets the definition of a liability on a LOC kinda-sorta-basis.
    It's not right but it's not wrong either. It at least was a useful way to think about code, and we'll see if that applies in LLM era.
    sswatson24 days ago
    It’s well known and also wrong.
    Delta’s airplanes also require a great deal of maintenance, and I’m sure they strive to have no more than are necessary for their objectives. But if you talk to one of Delta’s accountants, they will be happy to disabuse you of the notion that the planes are entered in the books as a liability.
    jvanderbot24 days ago
    Whoa whoa whoa let's not bring the accountants in!
    Code isn't a liability b/c it costs money (though it does). Code is a liability like an unsafe / unproven bridge is a liability. It works fine until it doesn't - and at that point you're in trouble. Just b/c you can build lots of bridges now, doesn't mean each new bridge isn't also a risk. But if you gotta get somewhere now, conjuring bridges might be the way to go. Doesn't make each bridge not a liability (risky thing to rely on) or an asset (thing you can sell, use to build value)
    dpark24 days ago
    Even proven code is a liability. The point of it being a liability is that it costs time and effort to maintain and update.
    The same with the bridge. Even the best built and most useful bridge requires maintenance. Assuming changing traffic patterns, it might equally require upgrades and changes.
    The problem with this whole “code is a liability” thing is that it’s vacuous. Your house is a liability. The bridge that gets you to work as a liability. Everything that requires any sort of maintenance or effort or upkeep or other future cost is ina sense a liability. This isn’t some deep insight though. This is like saying your bones could break so they are liability. OK, but their value drastically outweighs any liability they impose.
    twisterius23 days ago
    [dead]
    hshdhdhj444424 days ago
    If Delta was going bankrupt it would likely be able to sell individual planes for the depreciated book value or close to it.
    If a software company is going bankrupt, it’s very unlikely they will be able to sell code for individual apps and services they may have written for much at all, even if they might be able to sell the whole company for something.
    dpark24 days ago
    The other half of the quote about liability is that the capabilities of the code are an asset. I don’t know if your bankrupt company would be able to sell their code, but they sure as hell couldn’t sell their capabilities without the code.
    _heimdall24 days ago
    You're hinting at the underlying problem with the quote. "Asset" in the quote reads, at least to me, in the financial or accounting meaning of the term. "Liability" reads, again to me, in the sense of potential risk rather than the financial meaning. Its apples and oranges.
    Ygg224 days ago
    Liability is also an economic term. As in, "The bank's assets (debt) are my liability, and my assets (house) are the bank's liability."
    I don't think it's a wrong quote. Code's behavior is the asset, and code's source is the liability. You want to achieve maximum functionality for minimal source code investment.
    _heimdall24 days ago
    Sorry, my point wasn't that liability doesn't have a meaning in finance. My read of the quote is that it uses liability in the sense of risk not debt on a balance sheet.
    I could always be wrong though, that was just my interpretation of it. I don't get how code could be a liability in the financial sense, but I do get how every line of code risks bugs and other issues.
    Ygg224 days ago
    Sure, but all code is a potential future debt.
    You wrote a music player that only allows one artist from list of all artists? Tech debt.
    You wrote optimized assembly for x86_64? It's the year 2060, and we only support NGPU_ARM_N_LEG.
    The moment your expectations change (which is all the time), your code needs to be changed, and effort isn't free.
    _heimdall22 days ago
    Tech debt is not part of a financial account or disclosure though. Yes those are forms of debt, no they aren't financial debts or financial liabilities.
    Ygg220 days ago
    Sure it isn't. It is still a financial liability. You need to sacrifice time clearing it to get to work that actually pays the bills.
    foobarchu24 days ago
    If we're bringing in other industries, you'd be wise to consider banking. Savings accounts are something most people would consider an asset, because it's money the bank has on hand and can use for loan purposes.
    But it's the opposite, deposits are liabilities because they need interest paid out and can be withdrawn at any time.
    Just because the company has a thing that could be assigned value doesn't make it automatically an asset.
    OneMorePerson24 days ago
    It's possible for something to be both an asset and a potential liability, it isn't strictly one or the other.
    kortilla24 days ago
    Delta leases a big portion of its fleet, which makes your example pretty bad.
    simonsmithies24 days ago
    Not a terrible example. The planes delta owns are delta’s assets; the planes the leasing company owns are the leasing company’s assets. The point is, the code and the planes are assets despite the maintenance required to keep them in revenue-generating state.
    tom_m23 days ago
    Not a very valuable one. Never had been. That's the funny part. So many people want software but then don't know what to do once they have it.
  - wouldbecouldbe25 days ago
    Developers that can’t see the change are blind.
    Just this week, sun-tue. I added a fully functional subscription model to an existing platform, build out a bulk async elasticjs indexing for a huge database and migrated a very large Wordpress website to NextJS. 2.5 days, would have cost me at least a month 2 years ago.
    fxtentacle25 days ago
    To me, this sounds like:
    AI is helping me solve all the issues that using AI has caused.
    Wordpress has a pretty good export and Markdown is widely supported. If you estimate 1 month of work to get that into NextJS, then maybe the latter is not a suitable choice.
    serf24 days ago
    it's wild that somehow with regards to AI conversations lately someone can say "I saved 3 months doing X" and someone can willfully and thoughtfully reply "No you didn't , you're wrong." without hesitation.
    I feel bad for AI opponents mostly because it seems like the drive to be against the thing is stronger than the drive towards fact or even kindness.
    My .02c: I am saving months of efforts using AI tools to fix old (PRE-AI, PREHISTORIC!) codebases that have literally zero AI technical debt associated to them.
    I'm not going to bother with the charts & stats, you'll just have to trust me and my opinion like humans must do in lots of cases. I have lots of sharp knives in my kitchen, too -- but I don't want to have to go slice my hands on every one to prove to strangers that they are indeed sharp -- you'll just have to take my word.
    jbgt24 days ago
    Slice THEIR hands. They might say yours are rigged.
    I'm a non dev and the things I'm building blow me away. I think many of these people criticizing are perhaps more on the execution side and have a legitimate craft they are protecting.
    If you're more on the managerial side, and I'd say a trusting manager not a show me your work kind, then you're more likely to be open and results oriented.
    array_key_first24 days ago
    From a developer POV, or at least my developer POV, less code is always better. The best code is no code at all.
    I think getting results can be very easy, at first. But I force myself to not just spit out code, because I've been burned so, so, so many times by that.
    As software grows, the complexity explodes. It's not linear like the growth of the software itself, it feels exponential. Adding one feature takes 100x the time it should because everything is just squished together and barely working. Poorly designed systems eventually bring velocity to a halt, and you can eventually reach a point where even the most trivial of changes are close to impossible.
    That being said, there is value in throwaway code. After all, what is an Excel workbook if not throwaway code? But never let the throwaway become a product, or grow too big. Otherwise, you become a prisoner. That cheeky little Excel workbook can turn into a full-blown backend application sitting on a share drive, and it WILL take you a decade to migrate off of it.
    wouldbecouldbe24 days ago
    yeah AI is perfect at refactor and cleaning things up, you just have to instruct it. I've improved my code significanlty by asking it to clean up, refactor function to pure that I can use & test over a messy application. Without creating new bugs.
    xmcp12323 days ago
    Holy hell, AI is not at all perfect at refactoring. Absolutely terrified on your behalf if you believe this to be the case.
    mycall24 days ago
    You can use AI to simplify software stacks too, only your imagination limits you. How do you see things working with many less abstraction layers?
    I remember coding BASIC with POKE/PEEK assembly inside it, same with Turbo Pascal with assembly (C/C++ has similar extern abilities). Perhaps you want no more web or UI (TUI?). Once you imagine what you are looking for, you can label it and go from there.
    rerdavies24 days ago
    I am a (very) senior dev with decades of experience. And I, too, am blown away by the massive productivity gains I get from the use of coding AIs.
    Part of the craft of being a good developer is keeping up with current technology. I can't help thinking that those who oppose AI are not protecting legitimate craft, but are covering up their own laziness when it comes to keeping up. It seems utterly inconceivable to me that anyone who has kept up would oppose this technology.
    There is a huge difference between vibe coding and responsible professional use of AI coding assistants (the principle one, of course, being that AI-generated code DOES get reviewed by a human).
    But that, being said, I am enormously supportive of vibe coding by amateur developers. Vibe coding is empowering technology that puts programming power into the hands of amateur developers, allowing them to solve the problems that they face in their day-to-day work. Something that we've been working toward for decades! Will it be professional-quality code? No. Of course not. Will it do what it needs to do? Invariably, yes.
    xmcp12323 days ago
    I think the issue is that most vibe coders believe it is professional quality code, or is sufficient moving forward.
    It produces code (in the hands of an amateur) that is good enough for a demo or at best an MVP, but it’s not at all a stable foundation.
    mattmaroon24 days ago
    It is wild. I must admit I have a bit of Gell Mann amnesia when it comes to HN comments. I often check them to see what people think about an article, but then every time the article touches on something I know deeply, I realize it’s all just know-it-all puffery. Then I forget and check it when it’s on the many things I do not know much about.
    My cofounder is extremely technically competent, but all these people are like good luck with your spaghetti vibe code. It’s humorous.
    immibis24 days ago
    Just look at the METR study. All predictions were massive time savings but all observations were massive time losses. That's why we don't believe you when you say you saved time.
    solumunus23 days ago
    You should know better than to form a opinion from one study. I could show you endless examples of a study concluding untrue things, endless…
    I’ve been full time (almost solo) building an ERP system for years and my development velocity has gone roughly 2x. The new features are of equal quality, everything is code reviewed, everything is done in my style, adhering to my architectural patterns. Not to mention I’ve managed to build a mobile app alongside my normal full time work, something I wouldn’t have even had the time to attempt to learn about without the use of agents.
    So do you think I’m lying or do you just think my eyes are deceiving me somehow?
    Dylan1680723 days ago
    I think any measurement of development velocity is shaky, especially when measured between two different workflows, and especially when measured by the person doing the development.
    Such an estimate is far less reliable than your eyes are.
    So if people want to do more and better studies, that sounds great. But I have a good supply of salt for self-estimates. I'm listening to your input, but it's much easier for your self-assessment to have issues than you're implying.
    fuy23 days ago
    Not saying you're wrong, but solo developers building (relatively) greenfield projects are the best bet for increased AI productivity.
    Solo dev projects are usually reasonably sized (< million LOC), style is more uniform, there's fewer silos etc. etc.
    Good studies look at a broader picture.
    solumunus23 days ago
    It’s a very good point. I have full control and everything is incredibly uniform, and more recently designed with agents in mind. This must make things significantly easier for the LLM.
    23 days ago
    undefined
    wouldbecouldbe24 days ago
    You are assuming a lot of things.
    The work was moving the many landing pages & content elements to NextJS, so we can test, iterate and develop faster. While having a more stable system. This was a 10 year old website, with a very large custom WordPress codebase and many plugins.
    The content is still in WordPress backend & will be migrated in the second phase.
    651024 days ago
    There is much going on in that exchange.
    I don't even know what a Wordpress site is anymore.
    > then maybe the latter is not a suitable choice.
    But now it only takes days which makes it suitable?
    There also is the paradoxical question if it is worth the time from someone who knows what they are doing? how would you even tell?
    tengbretson24 days ago
    To me, this sounds like:
    If AI was good at a certain task then it was a bad task in the first place.
    Which is just run of the mill dogmatic thinking.
  - Zababa24 days ago
    >Code is not an asset it's a liability
    This would imply companies could delete all their code and do better, which doesn't seem true?
    thehappypm23 days ago
    A more accurate description of code is that it’s a depreciating asset, perhaps, or an asset that requires maintenance cost. Neither of which is a liability
  - 25 days ago
    undefined
- adrian_b24 days ago
  All the productivity enhancement provided by LLMs for programming is caused by circumventing the copyright restrictions of the programs on which they have been trained.
  You and anyone else could have avoided spending millions for programmer salaries, had you been allowed to reuse freely any of the many existing proprietary or open-source programs that solved the same or very similar problems.
  I would have no problem with everyone being able to reuse any program, without restrictions, but with these AI programming tools the rich are now permitted to ignore copyrights, while the poor remain constrained by them, as before.
  The copyright for programs has caused a huge multiplication of the programming effort for many decades, with everyone rewriting again and again similar programs, in order for their employing company to own the "IP". Now LLMs are exposing what would have happened in an alternative timeline.
  The LLMs have the additional advantage of fast and easy searching through a huge database of programs, but this advantage would not have been enough for a significant productivity increase over a competent programmer that would have searched the same database by traditional means, to find reusable code.
  - gallerdude24 days ago
    > the rich are now permitted to ignore copyrights, while the poor remain constrained by them, as before.
    Claude Code is $20 a month, and I get a lot of usage out of it. I don't see how cutting edge AI tools are only for the rich. The name OpenAI is often mocked, but they did succeed at bringing the cutting edge of AI to everyone, time and time again.
    amanaplanacanal24 days ago
    Oh they will totally rent you their privilege, further enrichening themselves. Of course!
    mattmaroon24 days ago
    My cofounder said his plan is $100 a month and if it were $1,000 he’d still pay it.
    So much of programming is tedium.
    eleumik22 days ago
    Cofounder of what?
  - marssaxman24 days ago
    Intellectual property law is a net loss to humanity, so by my reckoning, anything which lets us all work around that overhead gets some extra points on the credit side of the ledger.
    LexiMax24 days ago
    I agree in spirit, but in actual fact this subversion of intellectual property is disproportionately beneficial to those who can afford to steal from others and those who can afford to enforce their copyright, while disproportionately disadvantageous to those who can't afford to fend off a copyright lawsuit or can't afford to sue to enforce their copyright.
    The GP can free-ride uncredited on the collective work of open source at their leisure, but I'm sure Disney would string me up by my earlobes if I released a copywashed version of Toy Story 6.
  - immibis24 days ago
    Then it really proves how much the economy would be booming if we abolished copyright, doesn't it? China ignores copyright too, and look at them surpassing us in all aspects of technology, while Western economies choose to sabotage themselves to keep money flowing upwards to old guys.
    jvanderbot24 days ago
    Well no, because copyright != cannot use.
    "Available for use" and "Automatically rewritten to work in your codebase fairly well" is very different, so copyright is probably not the blocker technically
    mattmaroon24 days ago
    Yeah, I love the idea that all software could just be cobbled together from other software, but none of it does anything new.
    jimbokun24 days ago
    China is not surpassing the US in all aspects of technology.
    There is still much for them to steal.
    immibis23 days ago
    Can you name one area of technology where the USA has better technology than China? I'll wait.
    hattmall17 days ago
    UV Lithography, Space Exploration, self driving cars, weather prediction, industrial safety, circular knitting, medical imaging, robotics, bioengineering, agricultural yield management
    jimbokun22 days ago
    Software
    mattmaroon24 days ago
    The theory behind copyright is that the enshrined monopoly guarantees profits and thus encourages r&d.
    China steals our r&d (both copyrighted and non) and gets a lot of theirs from state funding.
    I don’t think I’d take China’s success as proof that the copyright system doesn’t work.
    LexiMax24 days ago
    It is proof that intellectual property is a transient and fickle thing, easily subverted when there is no legal framework to protect it.
    immibis24 days ago
    Sounds like you're saying state funded R&D works much better than copyright funded R&D
- nonethewiser25 days ago
  > Meanwhile, my cofounder is rewriting code we spent millions of salary on in the past by himself in a few weeks.
  Why?
  Im not even casting shade - I think AI is quite amazing for coding and can increase productivity and quality a lot.
  But I'm curious why he's doing this.
  - mattmaroon24 days ago
    The codebase is old and really hard to work on. It’s a game that existed pre-iPhone and still has decent revenue but could use some updating. We intentionally shrank our company down to auto-pilot mode and frankly don’t even have a working development environment anymore.
    It was basically cost prohibitive to change anything significant until Claude became able to do most of the work for us. My cofounder (also CTO of another startup in the interim) found himself with a lot of time on his hands unexpectedly and thought it would be a neat experiment and has been wowed by the results.
    Much in the same way people on HN debate when we will have self driving cars while millions of people actually have their Teslas self-driving every day (it reminds me of when I got to bet that Joe Biden would win the election after he already did) those who think AI coding is years away are missing what’s happening now. It’s a powerful force magnifier in the hands of a skilled programmer and it’ll only get better.
    idiotsecant24 days ago
    I agree that code is being written in exactly the same sense that Teslas are driving themselves.
    mattmaroon24 days ago
    Yes, in both cases it takes someone to steer. It’s not a complete solution. Someone who can’t drive can’t just ride a Tesla around town and someone who can’t program can’t vibe code anything complex.
    But if it can do 90% of the work for you, it is a serious force magnifier.
    Dylan1680723 days ago
    > But if it can do 90% of the work for you, it is a serious force magnifier.
    Well, we could characterize a Tesla as doing 90% of the work but it's not at all a force multiplier. Your "10%" supervisory contribution takes just as long as doing 100%.
    Bridged775624 days ago
    You didn't mention it was rewriting the codebase from scratch. That's the consensus, that AI is only good at scaffolding.
    Oh it can't do 90% of the work for you. It CAN type 90% of the work for you, but someone still has to read the code and know what the best course of action is, supposedly...... I suppose some people never learned to use their IDEs or to touch type so as to find LLMs such a crazy productivity boost.
    ggfdh24 days ago
    Do you have tests at least? Seems reckless to yolo the codebase if you don’t or can’t test easily.
    mattmaroon24 days ago
    He’s simply building an entirely new app. The old codebase plods on untouched.
    24 days ago
    undefined
    kunley24 days ago
    Millions? Of people in self driving teslas?
    The actual number of such vehicles produced is two orders of magnitude less.
    mattmaroon23 days ago
    No it isn’t. They sold 1.6 million vehicles last year and every one of them can self drive. At least once a year they give out free access to it and I don’t know a Tesla owner who hasn’t tried it.
    You’re right that it doesn’t happen every day but that doesn’t change the point, you all are debating whether something can happen after it already happened.
    kunley18 days ago
    What has already happened is that you unfortunately have manipulated the facts. How long were you planning to not reveal what you really mean: that some drivers have an oportunity to use self-driving once year and that they allegedly tried it, althought that's really just opinion based on your own preference. That's much different than claiming millions of such drivers roaming everywhete at any moment of time.
    Plus, the idea that every single Tesla has this option is just your rought estimate, right?
    Dylan1680722 days ago
    > you all are debating whether something can happen after it already happened
    My goalposts have always been on levels 3 and 4.
    Elon Musk has been promising level 4, sometimes level 5, this entire time.
    It has not already happened. The cars are level 2.
    nonethewiser24 days ago
    Sounds like a good reason to rewrite. And sounds like a rewrite just would not happen by any other means. Thanks for sharing the details.
    wolvoleo24 days ago
    When I say I want a self driving car I mean one that actually drives itself so I don't have to be involved other than setting the destination.
    What Tesla is selling now is the worst of both worlds. You still have to pay attention but it's way more boring so it's really hard to do so. Well until it suddenly decides to ram a barrier at highway speeds.
    Wake me up when I can have a beer and watch a movie while it's driving.
- nsoonhui25 days ago
  It's not directly comparable. The first time writing the code is always the hardest because you might have to figure out the requirements along the way. When you have the initial system running for a while, doing a second one is easier because all the requirements kinks are figured out.
  By the way, why does your co-founder have to do the rewrite at all?
  - el_benhameen25 days ago
    I find the opposite to be true. Once you know the problem you’re trying to solve (which admittedly can be the biggest lift), writing the fist cut of the code is fun, and you can design the system and set precedent however you want. Once it’s in the wild, you have to work within the consequences of your initial decisions, including bad ones.
    touristtam24 days ago
    ... And the undocumented code spaghetti that might come with a codebase that was touch by numerous hands.
  - nonethewiser25 days ago
    You can compare it - just factor that in. And compare writing it with AI vs. writing it without AI.
    We have no clue the scope of the rewrite but for anything non-trivial, 2 weeks just isn't going to be possible without AI. To the point of you probably not doing it at all.
    I have no idea why they are rewriting the code. That's another matter.
- cadamsdotcom24 days ago
  G’day Matt from myself another person with a cofounder both getting insane value out of AI and astounded at the attitudes around HN.
  You sound like complete clones of us :-)
  We’ve been at it since July and have built what used to take 3-5 people that long.
  To the haters: I use TDD and review every line of code, I’m not an animal.
  There’s just 2 of us but some days it feels like we command an army.
  - ta900021 days ago
    As a software engineer this scares me from an employment perspective but I also use Claude Code to produce 90% of the code I commit now (after reviewing and revision of course) so there’s that… ;)
- sidtrey24 days ago
  Senior developer here, your co-founder is making a huge mistake. Their lack of knowledge about the codebase will be your undoing. PS. I work in GenAI.
- aprdm25 days ago
  lol same. I just wrote a bunch of diagrams with mermaid that would legit take me a week, also did a mock of an UI for a frontend engineer that would take me another week to do .. or some designers. All of that in between meetings...
  Waiting for it to actually go well to see what else I can do !
  - nonethewiser25 days ago
    The more I have this experience and read people maligning AI for coding, the more I think the junior developers are actually not the ones in danger.
    daxfohl24 days ago
    Oh I've thought this for years. As an L7, basically my primary role is to serve as someone to bounce ideas off of, and to make recommendations based on experience. A chatbot, with its virtually infinite supply of experience, could ostensibly replace my role way sooner than it could a solid junior/mid-level coder. The main thing it needs is a consistent vision and direction that aligns with the needs of nearby teams, which frankly sounds not all that hard to write in code (I've been considering doing this).
    Probably the biggest gap would be the ability to ignite, drive, and launch new initiatives. How does an AI agent "lead" an engineering team? That's not something you can code up in an agent runtime. It'd require a whole culture change that I have a hard time seeing in reality. But of course if there comes a point where AI takes all the junior and mid-level coding jobs, then at that point there's no culture to change, so staff/principal jobs would be just as at risk.
    nonethewiser24 days ago
    I dont think it's a replacement for that but it's definitely provides some of it. Whereas senior+ level people were required before.
    I think that's sort of a theme with LLMs. It's not that it's better than "the real thing" (in this case a senior+ software engineer) or that its without flaws... but it's a fucking service. Having just a semblance of an advanced software engineer in your pocket is a game changer. It doesn't need to be perfect or better than the real thing to fundamentally change things.
    TACIXAT24 days ago
    I have the complete opposite impression w.r.t. architecture decisions. The LLMs can cargo cult an existing design, but they do not think through design consequences well at all. I use them as a rubber duck non-stop, but I think I respect less than one out of every six of their suggestions.
    daxfohl24 days ago
    They've gotten pretty good IME so long as you guide it to think out of the box, give it the right level of background info, have it provide alternatives instead of recommendations, and do your best not to bias it in any particular direction.
    That said, the thing it really struggles with is when the best approach is "do nothing". Which, given that a huge chunk of principal level work is in deciding what NOT to do, it may be a while before LLMs can viably take that role. A principal LLM based on current tech would approve every idea that comes across it, and moreover sell each of them as "the exact best thing needed by the organization right now!"
    XenophileJKO24 days ago
    Knowing when to nudge it out of a rut (or say skip it) is probably the biggest current skill. This is why experienced people get generally much better results.
    code_martial24 days ago
    I’m not sure. I keep asking the LLMs whether I should rewrite project X in language Y and it just asks back, “what’s your problem?” And most of the times it shoots my problems down showing exactly why rewriting won’t fix that particular problem. Heck, it even quoted Joel Spolsky once!
    Of course, I could just _tell_ it to rewrite, but that’s different.
    daxfohl24 days ago
    Damn, I'm sure even I'd have caved at some point. Did you get to VHDL? Your project etched in pure silicon? Yes! Must!
    Oh no, my job is safe no longer!
  - wombat-man24 days ago
    I have been able to prototype way faster. I can explain how I want a prototype reworked and it's often successful. Doesn't always work, but super useful more often than not.
  - windowpains24 days ago
    That line on the chart labeled “profit” is really going to go up now!
- reaperducer24 days ago
  I myself am saving a small fortune on design and photography and getting better results while doing it.
  Yay! Let's put all the artists out of business and funnel all the money to the tech industry. That's how to build a vibrant society. Yay!
- segfaultex25 days ago
  Sounds like an argument for better hiring practices and planning.
  Producing a lot of code isn’t proof of anything.
  - sheeh25 days ago
    Yep. Let’s see the projects and more importantly the incremental returns…
- RamblingCTO24 days ago
  In this thread: people throwing shade on tech that works, comparing it to a perfect world and making weird assumptions like no tests, no E2E or manual testing just to make a case. Hot take: most SWEs produce shit code, be it by constraints of any kind or their own abilities. LLMs do the same but cost less and can move faster. If you know how to use it, code will be fine. Code is a commodity and a lot of people will be blindsided by that in the future. If your value proposition is translating requirements into code, I feel sorry for you. The output quality of the LLM depends on the abilities of the operator. And most SWEs lack the system thinking to be good here, in my experience.
  As a fractional CTO and in my decade of being co-founder/CTO I saw a lot of people and codebases and most of it is just bad. You need to compare real life codebases and outputs of developers, not what people wished it would be like. And the reality is that most of it sucks and most SWEs are bad at their jobs.
- ehnto24 days ago
  Howcome you need to re-write millions of dollars in code?
- fzeroracer25 days ago
  > Meanwhile, my cofounder is rewriting code we spent millions of salary on in the past by himself in a few weeks.
  This is one of those statements that would horrify any halfway competent engineer. A cowboy coder going in, seeing a bunch of code and going 'I should rewrite this' is one of the biggest liabilities to any stable system.
  - hactually25 days ago
    I assume this is because they're already insanely profitable after hitting PMF and are now trying to bring down infra costs?
    Right? RIGHT?!
  - mattmaroon24 days ago
    My cofounder is an all the way competent engineer. Making this many assumptions would horrify someone halfway competent with logic though.
    phito24 days ago
    It's crazy how some people here will just make all the assumptions possible in order to refuse to believe you. Anyone who's used a good model with open code or equivalent will know that it's plausible. Refactoring is really cheap now when paired with someone competent.
    I'm doing the same as your co-founder currently. In a few days, I've rewritten old code that took previous employees months to do. Their implementation sucked and barely worked, the new one is so much better and has tests to prove it.
    mattmaroon24 days ago
    HN comments are wild.
  - habinero25 days ago
    Every professional SWE is going to stare off into the middle distance, as they flashback to some PM or VP deciding to show everyone they still got it.
    The "how hard could it be" fallacy claims another!
    bonesss24 days ago
    LLMs do the jobs of developers, thereby eating up countless jobs.
    LLMs do the jobs of developers without telling semi-technical arrogant MBA holders “no, you’re dumb”, thereby creating all the same jobs as before but also a butt-ton more juggling expensive cleanup mixed with ego-massaging.
    We’re talking a 2-10x improvement in ‘how hard could it be?’ iterations. Consultant candy.
    sheeh25 days ago
    As someone who is more involved in shaping the product direction rather than engineering what composes the product - I will readily admit many product people are utterly, utterly clueless.
    Most people have no clue the craftsmanship, work etc it takes to create a great product. LLMs are not going to change this, in fact they serve as a distraction.
    I’m not a SWE so I gain nothing by being bearish on the contributions of LLMs to the real economy ;)
    habinero24 days ago
    Oh, it wasn't a bash on product people, I'm sorry if it came off that way.
    It's a reference to a trope where the VP of Eng or CTO (who was an engineer decades ago) gets it in their head that they want to code again and writes something absolute dogshit terrible because their skills have degraded. Unfortunately they are your boss's boss's boss and can make you deal with it anyways.
    I've actually seen it IRL once, to his credit the dude finally realized the engineer smiles were pained grimaces and it got quietly dropped lol.
    UncleMeat24 days ago
    This has become my new hell.
    PM has an idea. PM vibe codes a demo of this idea. PM shows it to the VP. VP gets excited and says "when can we have this." I look at the idea and estimate it'll take two people six months. VP and PM say "what the heck, but AI built the demo in a weekend, you should be able to do this with one engineer in a month." I get one day closer to quitting.
    iwontberude25 days ago
    Definitely been in that room multiple times.
- 1vuio0pswjnm724 days ago
  When I read the blog post, the impression I get is that the author is referring to the proposed "business" of licensing or selling "generative AI" (i.e., making money for the licensor or seller), not whether generative AI is saving money for any particular user
  The author's second reference, an article from The Atlantic, describing the copyright liability issues with "generative AI", has been submitted to HN four times in the last week
  AI Memorization Research (theatlantic.com)
  2 points by tagyro 5 hours ago | flag | past | discuss
  AI's Memorization Crisis (theatlantic.com)
  2 points by twalichiewicz 1 day ago | flag | past | 1 comment
  AI's Memorization Crisis (theatlantic.com)
  3 points by palad1n 4 days ago | flag | past | 1 comment
  AI's Memorization Crisis (theatlantic.com)
  4 points by casparvitch 4 days ago | flag | past | discuss
- thefz24 days ago
  > Meanwhile, my cofounder is rewriting code we spent millions of salary on in the past by himself in a few weeks.
  If the LLM generating the code introduced a bug, who will be fixing it? The founder that does not know how to code or the LLM that made the mistake first?
- gloosx24 days ago
  >rewriting code
  Key thing here. The code was already written, so rewriting it isn't exactly adding a lot of quantifiable value. If millions weren't spent in the first place, there would be no code to rewrite.
- rf1524 days ago
  no need to wait, by using AI you already are mediocre at best (because you forego skill and quality for speed)
  - xnx24 days ago
    Is this also true of carpenters who use circular saws and airguns instead of hand saws and hammers?
    rf1524 days ago
    these are very different tools tbh
- jbbryant23 days ago
  Exactly, our venture studio that partnered with our startup collapsed. The code there team wrote was that took two years was terrible and didn't fully function. My CTO and I are rewriting 60% of the code with AI! Now everything works with bugs.....
- bwestergard25 days ago
  Out of curiosity, what is your product?
- alfonsodev24 days ago
  >I myself am saving a small fortune on design and photography and getting better results while doing it.
  Is this because you are improving your already existing design and photography skills and business ?
  Or have you bootstrapped from the scratch with AI ?
  Do you mind sharing or giving a hint ?
  Thanks!
- mschuster9125 days ago
  The problem is... you're going to deprive yourself of the talent chain in the long run, and so is everyone else who is switching over to AI, both generative like ChatGPT and transformative like the various translation, speech recognition/transcription or data wrangling models.
  For now, it works out for companies - but forward to, say, ten years in the future. There won't be new intermediates or seniors any more to replace the ones that age out or quit the industry entirely in frustration of them not being there for actual creativity but to clean up AI slop, simply because there won't have been a pipeline of trainees and juniors for a decade.
  But by the time that plus the demographic collapse shows its effects, the people who currently call the shots will be in pension, having long since made their money. And my generation will be left with collapse everywhere and find ways to somehow keep stuff running.
  Hell, it's already bad to get qualified human support these days. Large corporations effectively rule with impunity, with the only recourse consumers have being to either shell out immense sums of money for lawyers and court fees or turning to consumer protection/regulatory authorities that are being gutted as we speak both in money and legal protections, or being swamped with AI slop like "legal assistance" AI hallucinating case law.
  - saxenaabhi24 days ago
    > There won't be new intermediates or seniors any more to replace the ones that age out or quit the industry entirely in frustration of them not being there for actual creativity but to clean up AI slop, simply because there won't have been a pipeline of trainees and juniors for a decade.
    There are be plenty of self taught developers who didn't need any "traineeship". That proportion will increase even more with AI/LLMs and the fact that there are no more jobs for youngsters. And actually from looking at the purely toxic comments on this thread, I would say that's a good thing for youngsters to be not be exposed to such "seniors".
    Credentialism is dead. "Either ship or shutup" should be the mantra of this age.
    Bridged775624 days ago
    More like "Either slop or shut up". Classic startup culture, fuck processes and doing things right, it's all about larping and lying to investors. Damn right, your value as an engineer is all about how much slop you can churn out, I'd love (not) to be in a team filled with people like you.
- kermatt24 days ago
  Is the cofounder "rewriting" that code providing zero of the existing code as context? Doing it in a completely green field fashion?
  Or is any of the existing platform is used as an input for the rewrite?
- 24 days ago
  undefined
- vlod24 days ago
  >my cofounder is rewriting code we spent millions of salary on in the past by himself in a few weeks.
  I was expecting a language reference (we all know which one), to get more speed, safety and dare I say it "web scale" (insert meme). :)
  - oenton24 days ago
    > and dare I say it "web scale"
    Obligatory reference https://www.youtube.com/watch?v=b2F-DItXtZs
- sjw98724 days ago
  Good luck with fixing that future mess. This is such an incredibly short sighted approach to running a company and software dev that I think your cofounder is likely going to torpedo your company.
- mawadev24 days ago
  Doesn't this imply that you were not getting the level of efficiency out of your investment? It would be a little odd to say this publicly as this says more about you and your company. The question would be what your code does and if it is profitable.
- venndeezl25 days ago
  I suspect he means as a trillion dollar corporation led endeavor.
  I trained a small neural net on pics of a cat I had in the 00s (RIP George, you were a good cat).
  Mounted a webcam I had gotten for free from somewhere, above the cat door, in the exterior of the house.
  If the neural net recognized my cat it switched off an electromagnetic holding the pet door locked. Worked perfectly until I moved out of the rental.
  Neural nets are, end of the day, pretty cool. It's the data center business that's the problem. Just more landlords, wannabe oligarchs, claiming ownership over anything they can get the politicians to give them.
- blks24 days ago
  On design and photography? So you’re filling your product with slop images and graphics? Users won’t like it
- Voultapher24 days ago
  > I myself am saving a small fortune on design and photography and getting better results while doing it.
  Tell me you have bland taste without telling me you have bland taste. But if your customers eat it up and your slop manages to stand out in sea of slop, who am I to dislike slop.
tombert25 days ago
I find it a bit odd that people are acting like this stuff is an abject failure because it's not perfect yet.
Generative AI, as we know it, has only existed ~5-6 years, and it has improved substantially, and is likely to keep improving.
Yes, people have probably been deploying it in spots where it's not quite ready but it's myopic to act like it's "not going all that well" when it's pretty clear that it actually is going pretty well, just that we need to work out the kinks. New technology is always buggy for awhile, and eventually it becomes boring.
- maccard25 days ago
  > Generative AI, as we know it, has only existed ~5-6 years, and it has improved substantially, and is likely to keep improving.
  Every 2/3 months we're hearing there's a new model that just blows the last one out of the water for coding. Meanwhile, here I am with Opus and Sonnet for $20/mo and it's regularly failing at basic tasks, antigravity getting stuck in loops and burning credits. We're talking "copy basic examples and don't hallucinate APIs" here, not deep complicated system design topics.
  It can one shot a web frontend, just like v0 could in 2023. But that's still about all I've seen it work on.
  - Aurornis25 days ago
    You’re doing exactly the thing that the parent commenter pointed out: Complaining that they’re not perfect yet as if that’s damning evidence of failure.
    We all know LLMs get stuck. We know they hallucinate. We know they get things wrong. We know they get stuck in loops.
    There are two types of people: The first group learns to work within these limits and adapt to using them where they’re helpful while writing the code when they’re not.
    The second group gets frustrated every time it doesn’t one-shot their prompt and declares it all a big farce. Meanwhile the rest of us are out here having fun with these tools, however limited they are.
    maccard25 days ago
    Someone else said this perfectly farther down:
    > The whole discourse around LLMs is so utterly exhausting. If I say I don't like them for almost any reason, I'm a luddite. If I complain about their shortcomings, I'm just using it wrong. If I try and use it the "right" way and it still gets extremely basic things wrong, then my expectations are too high.
    As I’ve said, I use LLMs, and I use tools that are assisted by LLMs. They help. But they don’t work anywhere near as reliably as people talk about them working. And that hasn’t changed in the 18 months since I first promoted v0 to make me a website.
    vips7L24 days ago
    Rather be a Luddite than contribute to these soul suckers like OpenAI and help them lay off workers.
    j-bos24 days ago
    All tech work has been in service of laying off workers. Phone operator, bank teller, longshoreman (outside the US) all used to be serviceable careers to earn a lifetime.
    Gud24 days ago
    How are they “soul suckers”?
    Using LLMs has made it fun for me to make software again.
    kakacik24 days ago
    Shallow learning, overall laziness imprinted on the character over time. For kids and juniors starting the field they are much worse. None of the stuff I've learned over past 20 years was handed over to me in this easy fashion.
    Overconfident and over-positive shallow posts just hurt the overall discussion. Also some layer of arrogance - a typical 'if you struggle to get any significant value out of this new toy you must be doing something horribly wrong, look at us all being 100x productive!' which is never ever followed by some detailed explanation of their stack and other details.
    Clearly the tools have serious issues since most users struggle to get any sustained reliable added value, and everybody keeps hoping things will improve later due to it being able to write lengthy prose on various topics or fill our government documents.
    orangecat24 days ago
    None of the stuff I've learned over past 20 years was handed over to me in this easy fashion.
    Yeah, kids these days just include stdio.h and start printing stuff, no understanding of register allocation or hardware addressing modes. 20 years from now nobody will know how to write an operating system.
    Also some layer of arrogance
    As compared to "if you claim AI is useful for you, you're either delusional or a shill"? The difference is that the pro-AI side can accept that any specific case it may not work well, while detractors have to make the increasingly untenable argument that it's never useful.
  - tombert25 days ago
    Sure, but think about what it's replacing.
    If you hired a human, it will cost you thousands a week. Humans will also fail at basic tasks, get stuck in useless loops, and you still have to pay them for all that time.
    For that matter, even if I'm not hiring anyone, I will still get stuck on projects and burn through the finite number of hours I have on this planet trying to figure stuff out and being wrong for a lot of it.
    It's not perfect yet, but these coding models, in my mind, have gotten pretty good if you're specific about the requirements, and even if it misfires fairly often, they can still be useful, even if they're not perfect.
    I've made this analogy before, but to me they're like really eager-to-please interns; not necessarily perfect, and there's even a fairly high risk you'll have to redo a lot of their work, but they can still be useful.
    falloutx25 days ago
    I am an AI-skeptic but I would agree this looks impressive from certain angles, especially if you're an early startup (maybe) or you are very high up the chain and just want to focus on cutting costs. On the other hand, if you are about to be unemployed, this is less impressive. Can it replace a human? I would say no its still long way to go, but a good salesman can convince executives that it does and thats all that matters.
    xp8425 days ago
    > On the other hand, if you are about to be unemployed, this is less impressive
    > salesman can convince executives that it does
    I tend to think that reality will temper this trend as the results develop. Replacing 10 engineers with one engineer using Cursor will result in a vast velocity hit. Replacing 5 engineers with 5 "agents" assigned to autonomously implement features will result in a mess eventually. (With current technology -- I have no idea what even 2027 AI will do). At that point those unemployed engineers will find their phones ringing off the hook to come and clean up the mess.
    Not that unlike what happens in many situations where they fire teams and offshore the whole thing to a team of average developers 180 degrees of longitude away who don't have any domain knowledge of the business or connections to the stakeholders. The pendulum swings back in the other direction.
    tombert25 days ago
    I just think Jevins paradox [1]/Gustafson's Law [2] kind of applies here.
    Maybe I shouldn't have used the word "replaced", as I don't really think it's actually going to "replace" people long term. I think it's likely to just lead to higher output as these get better and better .
    [1] https://en.wikipedia.org/wiki/Jevons_paradox
    [2] https://en.wikipedia.org/wiki/Gustafson%27s_law
    falloutx25 days ago
    Not you, but the word replaced is the being used all the time. Even senior engineers are saying they are using it as a junior engineers while we can easily hire junior engineers (but Execs don't want to). Jevon's paradox wont work in Software because user's wallets and time is limited, and if software becomes too easy to build, it becomes harder to sell. Normal people can have 5 subscriptions, may be 10, but they wont be going to 50 or 100. I would say we would have already exhausted users already, with all the bad practices.
    zombot23 days ago
    Making humans look ridiculously and unrealisticly bad still doesn't invalidate criticism of AI, its overhyped marketing and all the astroturfing.
    maccard25 days ago
    You’ve missed my point here - I agree that gen AI has changed everything and is useful, _but_ I disagree that it’s improved substantially - which is what the comment I replied to claimed.
    Anecdotally I’ve seen no difference in model changes in the last year, but going from LLM to Claude code (where we told the LLMs they can use tools on our machines) was a game changer. The improvement there was the agent loop and the support for tools.
    In 2023 I asked v0.dev to one shot me a website for a business I was working on and it did it in about 3 minutes. I feel like we’re still stuck there with the models.
    BeetleB24 days ago
    I've been coding with LLMs for less than a year. As I mentioned to someone in email a few days ago: In the first half, when an LLM solved a problem differently from me, I would probe why and more often than not overrule and instruct it to do it my way.
    Now it's reversed. More often than not its method is better than mine (e.g. leveraging a better function/library than I would have).
    In general, it's writing idiomatic mode much more often. It's been many months since I had to correct it and tell it to be idiomatic.
    Macha24 days ago
    My experience in 2024 AI tools like copilot was if the code compiled first time it was an above average result and I’d need a lot of manual tweaking.
    There were definitely languages where it worked better (JS), but if I told people here I had to spend a lot of time tweaking after it, at least half of them assumed I was being really anal about spacing or variable names, which was simply not the case.
    It’s still the case for cheaper models (GPT-mini remains a waste of my timetime), but there’s mid level models like Minimax M2 that can produce working code and stuff like Sonnet can produce usable code.
    I’m not sure the delta is enough for me that I’d pay for these tools on my own though…
    tombert25 days ago
    In my experience it has gotten considerably better. When I get it to generate C, it often gets the pointer logic correct, which wasn't the case three years ago. Three years ago, ChatGPT would struggle with even fairly straightforward LaTeX, but now I can pretty easily get it to generate pretty elaborate LaTeX and I have even had good success generating LuaTeX. I've been able to fairly successfully have it generate TLA+ spec from existing code now, which didn't work even a year ago when I tried it.
    Of course, sample size of one, so if you haven't gotten those results then fair enough, but I've at least observed it getting a lot better.
    johnnienaked24 days ago
    Ya but what do you do when there are no humans left?
    cudgy24 days ago
    Prompt for a human?
  - elzbardico25 days ago
    There’s a subtle point a moment when you HAVE to take the driver wheel from the AI. All issues I see are from people insisting to use far beyond the point it stops being useful.
    It is a helper, a partner, it is still not ready go the last mile
    xp8425 days ago
    It's funny how many people don't get that. It's like adding a pretty great senior or staff level engineer to sit on-call next to every developer and assist them, for basically free (I've never used any of the expensive stuff yet. Just things like Copilot, Grok Code in JetBrains, just asking Gemini to write bits of code for me).
    If you hired a staff engineer to sit next to me, and I just had him/her write 100% of the code and never tried to understand it, that would be an unwise decision on my part and I'd have little room to complain about the times he made mistakes.
    maccard25 days ago
    As someone else said in this thread:
    > The whole discourse around LLMs is so utterly exhausting. If I say I don't like them for almost any reason, I'm a luddite. If I complain about their shortcomings, I'm just using it wrong. If I try and use it the "right" way and it still gets extremely basic things wrong, then my expectations are too high.
    I’m perfectly happy to write code, to use these tools. I do use them, and sometimes they work (well). Other times they have catastrophic failures. But apparently it’s my failure for not understanding the tool or expecting too much of the tool, while others are screaming from the rooftops about how this new model changes everything (which happens every 3 months at this point)
    elzbardico24 days ago
    There's no silver bullet. I’m not a researcher, but I’ve done my best to understand how these systems work—through books, video courses, and even taking underpaid hourly work at a company that creates datasets for RLHF. I spent my days fixing bugs step-by-step, writing notes like, “Hmm… this version of the library doesn’t support protocol Y version 4423123423. We need to update it, then refactor the code so we instantiate ‘blah’ and pass it to ‘foo’ before we can connect.”
    That experience gave me a deep appreciation for how incredible LLMs are and the amazing software they can power—but it also completely demystified them. So by all means, let’s use them. But let’s also understand there are no miracles here. Go back to Shannon’s papers from the ’60s, and you'll understand that what seems to you like "emerging behaviors" are quite explainable from an information theory background. Learn how these models are built. Keep up with the latests research papers. If you do, you’ll recognize their limitations before those limitations catch you by surprise.
    There is no silver bullet. And if you think you’ve found one, you’re in for a world of pain. Worse still, you’ll never realize the full potential of these tools, because you won’t understand their constraints, their limits, or their pitfalls.
    maccard24 days ago
    > There is no silver bullet. And if you think you’ve found one, you’re in for a world of pain. Worse still, you’ll never realize the full potential of these tools, because you won’t understand their constraints, their limits, or their pitfalls.
    See my previous comment (quoted below).
    > If I complain about their shortcomings, I'm just using it wrong. If I try and use it the "right" way and it still gets extremely basic things wrong, then my expectations are too high.
    Regarding "there are no miracles here"
    Here are a few comments from this thread alone,
    - https://news.ycombinator.com/item?id=46609559 - https://news.ycombinator.com/item?id=46610260 - https://news.ycombinator.com/item?id=46609800 - https://news.ycombinator.com/item?id=46611708
    Here's a few from some older threads: - https://news.ycombinator.com/item?id=46519851 - https://news.ycombinator.com/item?id=46485304
    There is a very vocal group who are telling us that there _is_ a silver bullet.
  - BeetleB25 days ago
    > We're talking "copy basic examples and don't hallucinate APIs" here, not deep complicated system design topics.
    If your metric is an LLM that can copy/paste without alterations, and never hallucinate APIs, then yeah, you'll always be disappointed with them.
    The rest of us learn how to be productive with them despite these problems.
    drewbug0125 days ago
    > If your metric is an LLM that can copy/paste without alterations, and never hallucinate APIs, then yeah, you'll always be disappointed with them.
    I struggle to take comments like this seriously - yes, it is very reasonable to expect these magical tools to copy and paste something without alterations. How on earth is that an unreasonable ask?
    The whole discourse around LLMs is so utterly exhausting. If I say I don't like them for almost any reason, I'm a luddite. If I complain about their shortcomings, I'm just using it wrong. If I try and use it the "right" way and it still gets extremely basic things wrong, then my expectations are too high.
    What, precisely, are they good for?
    tombert25 days ago
    I think what they're best at right now is the initial scaffolding work of projects. A lot of the annoying bootstrap shit that I hate doing is actually generally handled really well by Codex.
    I agree that there's definitely some overhype to them right now. At least for the stuff I've done they have gotten considerably better though, to a point where the code it generates is often usable, if sub-optimal.
    For example, about three years ago, I was trying to get ChatGPT to write me a C program to do a fairly basic ZeroMQ program. It generated something that looked correct, but it would crash pretty much immediately, because it kept trying to use a pointer after free.
    I tried the same thing again with Codex about a week ago, and it worked out of the box, and I was even able to get it to do more stuff.
    smithkl4225 days ago
    I think it USED to be true that you couldn't really use an LLM on a large, existing codebase. Our codebase is about 2 million LOC, and a year ago you couldn't use an LLM on it for anything but occasional small tasks. Now, probably 90% of the code I commit each week was written by Claude (and reviewed by me and other humans - and also by Copilot and ZeroPath).
    ubercow1325 days ago
    It seems like just such a weird and rigid way to evaluate it? I am a somewhat reasonable human coder, but I can't copy and paste a bunch of code without alterations from memory either. Can someone still find a use for me?
    BeetleB25 days ago
    For a long time, I've wanted to write a blog post on why programmers don't understand the utility of LLMs[1], whereas non-programmers easily see it. But I struggle to articulate it well.
    The gist is this: Programmers view computers as deterministic. They can't tolerate a tool that behaves differently from run to run. They have a very binary view of the world: If it can't satisfy this "basic" requirement, it's crap.
    Programmers have made their career (and possibly life) being experts at solving problems that greatly benefit from determinism. A problem that doesn't - well either that needs to be solved by sophisticated machine learning, or by a human. They're trained on essentially ignoring those problems - it's not their expertise.
    And so they get really thrown off when people use computers in a nondeterministic way to solve a deterministic problem.
    For everyone else, the world, and its solutions, are mostly non-deterministic. When they solve a problem, or when they pay people to solve a problem, the guarantees are much lower. They don't expect perfection every time.
    When a normal human asks a programmer to make a change, they understand that communication is lossy, and even if it isn't, programmers make mistakes.
    Using a tool like an LLM is like any other tool. Or like asking any other human to do something.
    For programmers, it's a cardinal sin if the tool is unpredictable. So they dismiss it. For everyone else, it's just another tool. They embrace it.
    [1] This, of course, is changing as they become better at coding.
    maccard25 days ago
    I’m perfectly happy for my tooling to not be deterministic. I’m not happy for it to make up solutions that don’t exist, and get stuck in loops because of that.
    I use LLMs, I code with a mix of antigravity and Claude code depending on the task, but I feel like I’m living in a different reality when the code I get out of these tools _regularly just doesn’t work, at all_. And to the parents point, I’m doing something wrong for noticing that?
    BeetleB25 days ago
    If it were terrible, you wouldn't use them, right? Isn't the fact that you continue to use AI coding tools a sign that you find them a net positive? Or is it being imposed on you?
    > And to the parents point, I’m doing something wrong for noticing that?
    There's nothing wrong pointing out your experience. What the OP was implying was he expects them to be able to copy/paste reliably almost 100% of the time, and not hallucinate. I was merely pointing out that he'll never get that with LLMs, and that their inability to do so isn't a barrier to getting productive use out of them.
    maccard24 days ago
    I was the person who said it can't copy from examples without making up APIs but.
    > he'll never get that with LLMs, and that their inability to do so isn't a barrier to getting productive use out of them.
    This is _exactly_ what the comment thread we're in said - and I agree with him. > The whole discourse around LLMs is so utterly exhausting. If I say I don't like them for almost any reason, I'm a luddite. If I complain about their shortcomings, I'm just using it wrong. If I try and use it the "right" way and it still gets extremely basic things wrong, then my expectations are too high.
    > If it were terrible, you wouldn't use them, right? Isn't the fact that you continue to use AI coding tools a sign that you find them a net positive? Or is it being imposed on you?
    You're putting words in my mouth here - I'm not saying that they're terrible, I'm saying they're way, way, way overhyped, their abilities are overblown, (look at this post and the replies of people saying they're writing 90% of code with claude and using AI tools to review it), but when we challenge that, we're wrong.
    BeetleB24 days ago
    Apologies. I confused you with drewbug up in the thread.
    prewett24 days ago
    My problem isn't lack of determinism, it's that it's solution frequently has basic errors that prevent it from working. I asked ChatGPT for a program to remove the background of an image. The resulting image was blue. When I pointed this out to ChatGPT it identified this as a common error in RGB ordering in OpenCV and told me the code to change. The whole process did not take very long, but this is not a cycle that is anything I want to be part of. (That, and it does not help me much to give me a basic usage of OpenCV that does not work for the complex background I wanted to remove)
    Then there are the cases where I just cannot get it do what I ask. Ask Gemini to remove the background of an image and you get a JPEG with a backed in checkerboard background, even when you tell it to produce an RGBA PNG. Again, I don't have any use for that.
    But it does know a lot of things, and sometimes it informs me of solutions I was not aware of. The code isn't great, but if I were non-technical (or not very good), this would be fantastic and better than I could do.
    habinero24 days ago
    > And so they get really thrown off when people use computers in a nondeterministic way to solve a deterministic problem
    Ah, no. This is wildly off the mark, but I think a lot of people don't understand what SWEs actually do.
    We don't get paid to write code. We get paid to solve problems. We're knowledge workers like lawyers or doctors or other engineers, meaning we're the ones making the judgement calls and making the technical decisions.
    In my current job, I tell my boss what I'm going to be working on, not the other way around. That's not always true, but it's mostly true for most SWEs.
    The flip side of that is I'm also held responsible. If I write ass code and deploy it to prod, it's my ass that's gonna get paged for it. If I take prod down and cause a major incident, the blame comes to me. It's not hard to come up with scenarios where your bad choices end up costing the company enormous sums of money. Millions of dollars for large companies. Fines.
    So no, it has nothing to do with non-determinism lol. We deal with that all the time. (Machine learning is decades old, after all.)
    It's evaluating things, weighing the benefits against the risks and failure modes, and making a judgement call that it's ass.
    blibble25 days ago
    > What, precisely, are they good for?
    scamming people
    viking12324 days ago
    Also good for manufacturing consent in Reddit and other places. Intelligence services busy with certain country now, bots using LLMs to pump out insane amounts of content to mold the information atmosphere.
    falloutx25 days ago
    Its strong enough to replace humans at their jobs and weak enough that it cant do basic things. Its a paradox. Just learn to be productive with them. Pay $200/month and work around with its little quirks. /s
  - nonethewiser24 days ago
    >Every 2/3 months we're hearing there's a new model that just blows the last one out of the water for coding
    I haven't heard that at all. I hear about models that come out and are a bit better. And other people saying they suck.
    >Meanwhile, here I am with Opus and Sonnet for $20/mo and it's regularly failing at basic tasks, antigravity getting stuck in loops and burning credits.
    Is it bringing you any value? I find it speeds things up a LOT.
  - user3428324 days ago
    I have a hard time believing that this v0, from 2023, achieved comparable results to Gemini 3 in Web design.
    Gemini now often produces output that looks significantly better than what I could produce manually, and I'm an expert for web, although my expertise is more in tooling and package management.
  - 25 days ago
    undefined
  - golly_ned24 days ago
    Frankly I think the 'latest' generation of models from a lot of providers, which switch between 'fast' and 'thinking' modes, are really just the 'latest' because they encourage users to use cheaper inference by default. In chatgpt I still trust o3 the most. It gives me fewer flat-out wrong or nonsensical responses.
    I'm suspecting that once these models hit 'good enough' for ~90% of users and use cases, the providers started optimizing for cost instead of quality, but still benchmark and advertise for quality.
- barbazoo25 days ago
  We implement pretty cool workflows at work using "GenAI" and the users of our software are really appreciative. It's like saying a hammer sucks because it breaks most things you hit with it.
- nonethewiser24 days ago
  >Generative AI, as we know it, has only existed ~5-6 years
  Probably less than that, practically speaking. ChatGPT's initial release date was November 2022. It's closer to 3 years, in terms of any significant amount of people using them.
- onlyrealcuzzo25 days ago
  > Generative AI, as we know it, has only existed ~5-6 years, and it has improved substantially, and is likely to keep improving.
  I think the big problem is that the pace of improvement was UNBELIEVABLE for about 4 years, and it appears to have plateaued to almost nothing.
  ChatGPT has barely improved in, what, 6 months or so.
  They are driving costs down incredibly, which is not nothing.
  But, here's the thing, they're not cutting costs because they have to. Google has deep enough pockets.
  They're cutting costs because - at least with the current known paradigm - the cost is not worth it to make material improvements.
  So unless there's a paradigm shift, we're not seeing MASSIVE improvements in output like we did in the previous years.
  You could see costs go down to 1/100th over 3 years, seriously.
  But they need to make money, so it's possible non of that will be passed on.
  - tombert25 days ago
    I think that even if it never improves, its current state is already pretty useful. I do think it's going to improve though I don't think AGI is going to happen any time soon.
    I have no idea what this is called, but it feels like a lot of people assume that progress will continue at a linear pace for forever for things, when I think that generally progress is closer to a "staircase" shape. A new invention or discovery will lead to a lot of really cool new inventions and discoveries in a very short period of time, eventually people will exhaust the low-to-middle-hanging fruit, and progress kind of levels out.
    I suspect it will be the same way with AI; I don't now if we've reached the top of our current plateau, but if not I think we're getting fairly close.
    jamesfinlayson24 days ago
    Yes I've read about something like before - like the jump from living in 1800 to 1900 - you go from no electricity at home to having electricity at home for example. The jump from 1900 to 2000 is much less groundbreaking for the electricity example - you have more appliances and more reliable electricity but it's nothing like the jump from candle to light bulb.
    pigpop24 days ago
    Maybe you meant 1900s to 2000s but if you meant the year 1900 to the year 2000 then that century of difference saw a lot more innovation than just the "candle to lightbulb" change of 1800 to 1900.
    I'll interpret it as meaning 1800s to 1900s to 2000s. I'd argue that we haven't yet seen the same step change as 1800s to 1900s this century because we're only just beginning the ramp up on the new technology that will drive progress this century similar to how in 1926 they were still ramping up on the use of electricity and internal combustion engines.
    Let's take electricity as the primary example though since it's the one you mentioned and it's probably more similar to our current situation with AI. The similarities include the need for central generating stations to supply raw power to end users as well as the need for products designed to make use of that power and provide some utility to the consumer. Efficiency of generation is also a primary concern for both as it's a major cost driver. Both of those required significant investment and effort to solve in the early days of electrification.
    We're now solving similar problems with AI, instead of power plants we're building datacenters, instead of lightbulbs and washing machines we're developing chat bot integrations and agents, instead of improving dynamos we're improving GPUs and TPUs. I fully expect we'll follow a similar curve for deployment as we find new uses, improve existing ones and integrate this new power source into an increasing number of domains.
    We do have one major advantage though, we've already built The Grid for distribution which saves a massive amount of effort.
    This article is a good read on the permeation of electricity through the economy
    https://www.construction-physics.com/p/the-birth-of-the-grid
    onlyrealcuzzo24 days ago
    Arguably the jump around the space age is a bigger jump than everything else between ~1900 and now - whenever you want to define that small period.
    We may be in a similar step-jump period now, where over the next 10-15 years we'll see some pretty big advancements in robotics due to AI, and then all of the low hanging fruit will be picked until there some other MAJOR breakthrough
  - sheeh25 days ago
    They are focused on reducing costs in order to survive. Pure and simple.
    Alphabet / Google doesn’t have that issue. OAI and other money losing firms do.
- bloppe24 days ago
  I don't think LLMs are an abject failure, but I find it equally odd that so many people think that transformer-based LLMs can be incrementally improved to perfection. It seems pretty obvious to me now that we're not gonna RLHF our way out of hallucinations. We'll probably need a few more fundamental architecture breakthroughs to do that.
- 1970-01-0125 days ago
  >and is likely to keep improving.
  I'm not trying to be pedantic, but how did you arrive at 'keep improving' as a conclusion? Nobody is really sure how this stuff actually works. That's why AI safety was such a big deal a few years ago.
  - tombert25 days ago
    Totally reasonable question, and I only am making an assumption based on observed progress. AI generated code, at least in my personal experience, has gotten a lot better, and while I don't think that will go to infinity, I do think that there's still more room for improvement that could happen.
    I will acknowledge that I don't have any evidence of this claim, so maybe the word "likely" was unwise, as that suggests probability. Feel free to replace "is "likely to" with "it feels like it will".
- jbs78925 days ago
  Because the likes of Altman have set short term expectations unrealistically high.
  - tombert25 days ago
    I mean that's every tech company.
    I made a joke once after the first time I watched one of those Apple announcement shows in 2018, where I said "it's kind of sad, because there won't be any problems for us to solve because the iPhone XS Max is going to solve all of them".
    The US economy is pretty much a big vibes-based Ponzi scheme now, so I don't think we can single-out AI, I think we have to blame the fact that the CEOs running these things face no negative consequences for lying or embellishing and they do get rewarded for it because it will often bump the stock price.
    Is Tesla really worth more than every other car company combined in any kind of objective sense? I don't think so, I think people really like it when Elon lies to them about stuff that will come out "next year", and they feel no need to punish him economically.
    Terr_24 days ago
    "Ponzi" requires records fraud and is popularly misused, sort of like if people started describing every software bug as "a stack overflow."
    I'd rather characterize it as extremes of Greater Fool Theory.
    https://en.wikipedia.org/wiki/Greater_fool_theory
    tombert24 days ago
    I would argue it’s fraud-adjacent. These tech CEOs know that they’re not going to be able to keep the promises that they’re making. It’s dishonest at the very least, if it doesn’t legally constitute “fraud”.
  - hamdingers25 days ago
    I maintain that most anti-AI sentiment is actually anti-lying-tech-CEO sentiment misattributed.
    The technology is neat, the people selling it are ghouls.
    acdha25 days ago
    Exactly: the technology is useful but because the executive class is hyping it as close to AGI because their buddies are slavering for layoffs. If that “when do you get fired?” tone wasn’t behind the conversation, I think a lot of people would be interested in applying LLMs to the smaller subset of things they actually perform well at.
    wolvoleo24 days ago
    For me it's mostly about the subset of things that LLMs suck at but still rammed in everywhere because someone wants to make a quick buck.
    I know it's good tech for some stuff, just not for everything. It's the same with previous bubbles. VR is really great for some things but we were never going to work with a headset on 8 hours a day. Bitcoin is pretty cool but we were never going to do our shopping list on Blockchain. I'm just so sick of hypes.
    But I do think it's good tech, just like I enjoy VR daily I do have my local LLM servers (I'm pretty anti cloud so I avoid it unless I really need the power)
    It's not really about the societal impacts for me, at least not yet, it's just not good enough for that yet. I do worry about that longer-term but not with the current generation of AI. At my work we've done extensive benchmarking (especially among enthusiastic early adopters) and while it can save a couple hours a week we're nowhere near the point where it can displace FTEs.
    acdha24 days ago
    Yeah, I think those are coming from the same place: so many companies are trying to wedge LLMs into everything, especially contexts where you really need actual reasoning to accomplish a task, and it’s just such a “magic VC money fairy, pick us!” play that it distracts from the underlying tech opening up some text processing capabilities we would’ve thought were amazing a few years ago.
    tombert25 days ago
    Maybe CEOs should face consequences for going on the stage and outwardly lying. Instead they're rewarded by a bump in stock price because people appear to have amnesia.
    sroerick24 days ago
    This is how I felt about Bitcoin.
    viking12324 days ago
    I hate the Anthropic guy so much.. when I see the face it just brings back all the nonsense lies and "predictions" he says. Altman is kind of the same but for some reason Dario kind of takes the cake.
- johnnienaked24 days ago
  You're saying the same thing cryptobros say about bitcoin right now, and that's 17 years later.
  It's a business, but it won't be the thing the first movers thought it was.
  - tombert24 days ago
    It’s different in that Bitcoin was never useful in any capacity when it was new. AI is at least useful right now and it’s improved considerably in the last few years.
    johnnienaked24 days ago
    It was useful for doing illegal shit
1a527dd525 days ago
A year ago I would have agreed wholeheartedly and I was a self confessed skeptic.
Then Gemini got good (around 2.5?), like I-turned-my-head good. I started to use it every week-ish, not to write code. But more like a tool (as you would a calculator).
More recently Opus 4.5 was released and now I'm using it every day to assist in code. It is regularly helping me take tasks that would have taken 6-12 hours down to 15-30 minutes with some minor prompting and hand holding.
I've not yet reached the point where I feel letting is loose and do the entire PR for me. But it's getting there.
- kstrauser25 days ago
  > I was a self confessed skeptic.
  I think that's the key. Healthy skepticism is always appropriate. It's the outright cynicism that gets me. "AI will never be able to [...]", when I've been sitting here at work doing 2/3rds of those supposedly impossible things. Flawlessly? No, of course not! But I don't do those things flawlessly on the first pass, either.
  Skepticism is good. I have no time or patience for cynics who dismiss the whole technology as impossible.
  - sublinear24 days ago
    I think the concern expressed as "impossible" is whether it can ever do those things "flawlessly" because that's what we actually need from its output. Otherwise a more experienced human is forced to do double work figuring out where it's wrong and then fixing it.
    This is not a lofty goal. It's what we always expect from a competent human regardless of the number of passes it takes them. This is not what we get from LLMs in the same amount time it takes a human to do the work unassisted. If it's impossible then there is no amount of time that would ever get this result from this type of AI. This matters because it means the human is forced to still be in the loop, not saving time, and forced to work harder than just not using it.
    I don't mean "flawless" in the sense that there cannot be improvements. I mean that the result should be what was expected for all possible inputs, and when inspected for bugs there are reasonable and subtle technical misunderstandings at the root of them (true bugs that are possibly undocumented or undefined behavior) and not a mess of additional linguistic ones or misuse. This is the stronger definition of what people mean by "hallucination", and it is absolutely not fixed and there has been no progress made on it either. No amount of prompting or prayer can work around it.
    This game of AI whack-a-mole really is a waste of time in so many cases. I would not bet on statistical models being anything more than what they are.
- spaceywilly25 days ago
  I would strongly recommend this podcast episode with Andrej Karpathy. I will poorly summarize it by saying his main point is that AI will spread like any other technology. It’s not going to be a sudden flash and everything is done by AI. It will be a slow rollout where each year it automates more and more manual work, until one day we realize it’s everywhere and has become indispensable.
  It sounds like what you are seeing lines up with his predictions. Each model generation is able to take on a little more of the responsibilities of a software engineer, but it’s not as if we suddenly don’t need the engineer anymore.
  https://www.dwarkesh.com/p/andrej-karpathy
  - daxfohl24 days ago
    Though I think it's a very steep sigmoid that we're still far on the bottom half of.
    For math it just did its first "almost independent" Erdos problem. In a couple months it'll probably do another, then maybe one each month for a while, then one morning we'll wake up and find whoom it solved 20 overnight and is spitting them out by the hour.
    For software it's been "curiosity ... curiosity ... curiosity ... occasionally useful assistant ... slightly more capable assistant" up to now, and it'll probably continue like that for a while. The inflection point will be when OpenAI/Anthropic/Google releases an e2e platform meant to be driven primarily by the product team, with engineering just being co-drivers. It probably starts out buggy and needing a lot of hand-holding (and grumbling) from engineering, but slowly but surely becomes more independently capable. Then at some point, product will become more confident in that platform than their own engineering team, and begin pushing out features based on that alone. Once that process starts (probably first at OpenAI/Anthropic/Google themselves, but spreading like wildfire across the industry), then it's just a matter of time until leadership declares that all feature development goes through that platform, and retains only as many engineers as is required to support the platform itself.
    nullpoint42024 days ago
    And then what? Am I supposed to be excited about this future?
    suddenlybananas24 days ago
    You have to remember that half these people think they are building god.
    daxfohl24 days ago
    Hard to say. In business we'll still have to make hard decisions about unique situations, coordinate and align across teams and customers, deal with real world constraints and complex problems that aren't suitable to feed to an LLM and let it decide. In particular, deciding whether or not to trust an LLM with a task will itself always be a human decision. I think there will always be a place for analytical thinking in business even if LLMs do most of the actual engineering. If nothing else, the speed at which they work will require an increase in human analytical effort, to maximize their efficacy while maintaining safety and control.
    In the academic world, and math in particular, I'm not sure. In a way, you could say it doesn't change anything because proofs already "exist" long before we discover them, so AI just streamlines that discovery. Many mathematicians say that asking the right questions is more important than finding the answers. In which case, maybe math turns into something more akin to philosophy or even creative writing, and equivalently follows the direction that we set for AI in those fields. Which is, perhaps less than one would think: while AI can write a novel and it could even be pretty good, part of the value of a novel is the implicit bond between the author and the audience. "Meaning" has less value coming from a machine. And so maybe math continues that way, computers solving the problems but humans determining the meaning.
    Or maybe it all turns to shit and the sheer ubiquity of "masterpieces" of STEM/art everything renders all human endeavor pointless. Then the only thing that's left worth doing is for the greedy, the narcissists, and the power hungry to take the world back to the middle ages where knowledge and search for meaning take a back seat to tribalism and war mongering until the datacenters power needs destroy the planet.
    I'm hoping for something more like the former, but, it's anybody's guess.
    user3428324 days ago
    If machines taking over labor and allowing humans to live a life of plenty instead of slaving away in jobs isn't exciting, then I don't know what is.
    I guess cynics will yap about capitalism and how this supposedly benefits only the rich. That seems very unimaginative to me.
    sensanaty24 days ago
    > That seems very unimaginative to me.
    Does it? How exactly is the common Joe going to benefit from this world where the robots are doing the job he was doing before, as well as everyone else's job (aka, no more jobs for anyone)? Where exactly is the money going to come from to make sure Joe can still buy food? Why on earth would the people in power (aka the psychotic CxOs) care to expend any resources for Joe, once they control the robots that can do everything Joe could? What mechanisms exist for everyone here to prosper, rather than a select few who already own more wealth and power than the majority of the planet combined?
    I think believing in this post-scarcity utopian fairy tale is a lot less imaginative and grounded than the opposite scenario, one where the common man gets crushed ruthlessly.
    We don't even have to step into any kind of fantasy world to see this is the path we're heading down, in our current timeline as we speak, CEOs are foaming at the mouth to replace as many people as they can with AI. This entire massive AI/LLM bubble we find ourselves in is predicated on the idea that companies can finally get rid of their biggest cost centers, their human workers and their pesky desires like breaks and vacations and worker's rights. And yet, there's still somehow people out there that will readily lap up the bullshit notion that this tech is going to somehow be used as a force of good? That I find completely baffling.
    joquarky24 days ago
    Many people seem to have this ideal that UBI is inevitable and will solve a bunch of these sort of problems.
    But I don't see how UBI can avoid the same complexities as our tax systems, where it will be used to try to influence behaviors, growing cruft along the way just like taxes.
    user3428324 days ago
    To me it's completely baffling how people imagine that with human labor largely going obsolete, we will just stick with capitalism and all workers go hungry in some dystopian fantasy.
    Many cynics seem to believe rich people are demons with zero consideration for their fellow humans.
    Rich and powerful persons are still people just like you, and they have an interest in keeping the general population happy. Not to mention that we have democratic mechanisms that give power to the masses.
    We will obviously transition to a system where most of us can live a comfortable life without working a full time job, and it's going to be great.
    sensanaty24 days ago
    > Many cynics seem to believe rich people are demons with zero consideration for their fellow humans.
    Do they have considerations for their fellow humans? I certainly haven't observed that they give a shit about anyone or anything that isn't their bottom line. What exactly has Zuckerberg contributed to this world and to his fellow man, other than a mass data harvesting operation that has enabled real life genocides?
    "They 'trust me'. Dumb fucks." - Zuckerberg, talking about Facebook users.
    What has Bezos done for the average Amazon warehouse worker, other than stick them in grueling conditions where they even have their toilet breaks timed, just to squeeze out every single inch of life out of his workers he can? What have the people working for Big Oil done that is beneficial to humanity, other than suppressing climate change research and funding lobbying groups to hide the fact that they knew about climate change since the 70s? What have the tobacco execs done for humanity, other than bribing doctors to falsify medical research indicating that tobacco isn't harmful? I could go on and on about all the evils brought on to the world by psychotic executives and their sycophantic legions sucking the teet hoping for a handout, but we'd be here all day.
    Sure, there's a few philanthropists out there bobbing around in the ocean of soulless psychopaths that are doing some good things, but they're very much the exception.
    > Not to mention that we have democratic mechanisms that give power to the masses.
    Even (especially?) just looking solely from a US POV, these democratic mechanisms are quickly and actively being eroded by these "considerate" billionaires like Thiel (who is quite openly & proudly naming his companies using literally evil things from Tolkien's works). They're talking about taking over Greenland to distract from them all being ousted as pedophiles for fuck's sake, what "democractic mechanisms"?
    > We will obviously transition to a system where most of us can live a comfortable life without working a full time job, and it's going to be great.
    I again don't see how this is "obvious", and you haven't outlined anything about how this utopia is supposed to work other than extremely vague statements. How is this utopian state more obvious than the one we are currently freefalling into, a dystopian police state where your every breath is being tracked in some database that is then shared with anyone with 3 pennies to pay to access the data?
    daxfohl24 days ago
    Even in the utopia scenario, that experiment has been taken to its natural conclusion on rats back in the 70s and the results were...interesting, to say the least. (google "Universe 25"). I feel like in many ways, a devolution to feudalism and tribal warfare would be preferable.
    joquarky24 days ago
    They care about their fellow humans about as much as corporate farms care about their livestock.
  - sheeh25 days ago
    AI first of all is not a technology.
    Can people get their words straight before typing?
    shawabawa325 days ago
    Is LLM a technology? Are you complaining about the use of AI to mean LLM? Because I think that ship has sailed
- cameronh9024 days ago
  I'm now putting more queries into LLMs than I am into Google Search.
  I'm not sure how much of that is because Google Search has worsened versus LLMs having improved, but it's still a substantial shift in my day-to-day life.
  Something like finding the most appropriate sensor ICs to use for a particular use case requires so much less effort than it used to. I might have spent an entire day digging through data sheets before, and now I'll find what I need in a few minutes. It feels at least as revolutionary as when search replaced manually paging through web directories.
- dmux24 days ago
  I feel like I'm living in a totally different world or I'm being gaslit by LLMs when I read stuff like this and other similar comments in this thread. Do you mind mentioning _what_ language / tech stack you're in? At my current job, we have a large Ruby on Rails codebase and just this week Gemini 2.5 and 3 struggled to even identify what classes inherited from another class.
dreadsword25 days ago
This feels like a pretty low effort post that plays heavily to superficial reader's cognitive biases.
I work commercializing AI in some very specific use cases where it extremely valuable. Where people are being lead astray is layering generalizations: general use cases (copilots) deployed across general populations and generally not doing very well. But that's PMF stuff, not a failure of the underlying tech.
- kokanee25 days ago
  I think both sides of this debate are conflating the tech and the market. First of all, there were forms of "AI" before modern Gen AI (machine learning, NLP, computer vision, predictive algorithms, etc) that were and are very valuable for specific use cases. Not much has changed there AFAICT, so it's fair that the broader conversation about Gen AI is focused on general use cases deployed across general populations. After all, Microsoft thinks it's a copilot company, so it's fair to talk about how copilots are doing.
  On the pro-AI side, people are conflating technology success with product success. Look at crypto -- the technology supports decentralization, anonymity, and use as a currency; but in the marketplace it is centralized, subject to KYC, and used for speculation instead of transactions. The potential of the tech does not always align with the way the world decides to use it.
  On the other side of the aisle, people are conflating the problematic socio-economics of AI with the state of the technology. I think you're correct to call it a failure of PMF, and that's a problem worth writing articles about. It just shouldn't be so hard to talk about the success of the technology and its failure in the marketplace in the same breath.
- Aurornis25 days ago
  > This feels like a pretty low effort post that plays heavily to superficial reader's cognitive biases.
  I haven’t followed this author but the few times he’s come up his writings have been exactly this.
gejose25 days ago
I believe Gary Marcus is quite well known for terrible AI predictions. He's not in any way an expert in the field. Some of his predictions from 2022 [1]
> In 2029, AI will not be able to watch a movie and tell you accurately what is going on (what I called the comprehension challenge in The New Yorker, in 2014). Who are the characters? What are their conflicts and motivations? etc.
> In 2029, AI will not be able to read a novel and reliably answer questions about plot, character, conflicts, motivations, etc. Key will be going beyond the literal text, as Davis and I explain in Rebooting AI.
> In 2029, AI will not be able to work as a competent cook in an arbitrary kitchen (extending Steve Wozniak’s cup of coffee benchmark).
> In 2029, AI will not be able to reliably construct bug-free code of more than 10,000 lines from natural language specification or by interactions with a non-expert user. [Gluing together code from existing libraries doesn’t count.]
> In 2029, AI will not be able to take arbitrary proofs from the mathematical literature written in natural language and convert them into a symbolic form suitable for symbolic verification.
Many of these have already been achieved, and it's only early 2026.
[1]https://garymarcus.substack.com/p/dear-elon-musk-here-are-fi...
- merlincorey25 days ago
  Which ones are you claiming have already been achieved?
  My understanding of the current scorecard is that he's still technically correct, though I agree with you there is velocity heading towards some of these things being proven wrong by 2029.
  For example, in the recent thread about LLMs and solving an Erdos problem I remember reading in the comments that it was confirmed there were multiple LLMs involved as well as an expert mathematician who was deciding what context to shuttle between them and helping formulate things.
  Similarly, I've not yet heard of any non-expert Software Engineers creating 10,000+ lines of non-glue code that is bug-free. Even expert Engineers at Cloud Flare failed to create a bug-free OAuth library with Claude at the helm because some things are just extremely difficult to create without bugs even with experts in the loop.
  - bspammer25 days ago
    The bug-free code one feels unfalsifiable to me. How do you prove that 10,000 lines of code is bug-free, and then there's a million caveats about what a bug actually is and how we define one.
    The second claim about novels seems obviously achieved to me. I just pasted a random obscure novel from project gutenberg into a file and asked claude questions about the characters, and then asked about the motivations of a random side-character. It gave a good answer, I'd recommend trying it yourself.
    verse25 days ago
    I agree with you but I'd point out that unless you've read the book it's difficult to know if the answer you got was accurate or it just kinda made it up. In my experience it makes stuff up.
    Like, it behaves as if any answer is better than no answer.
    evrydayhustling25 days ago
    So do humans asked to answer tests. The appropriate thing is to compare to human performance at the same task.
    At most of these comprehension tasks, AI is already superhuman (in part because Gary picked scaled tasks that humans are surprisingly bad at).
    rafaelmn24 days ago
    You can't really compare to human performance because the failure modes and performance characteristics are so different.
    In some instances you'll get results that are shockingly good (and in no time), in others you'll have a grueling experience going in circles over fundamental reasoning, where you'd probably fire any person on the spot for having that kind of a discussion chain.
    And there's no learning between sessions or subject area mastery - results on the same topic can vary within same session (with relevant context included).
    So if something is superhuman and subhuman a large percentage of time but there's no good way of telling which you'll get or how - the result isn't the average if you're trying to use the tool.
    retrac25 days ago
    I've done similar loose experiments using a 19th century French novel I have read (L'Agonie by Jean Lombard), and which is quite obscure. There is no English translation of the novel published.
    Using Gemini Notebooks I've found it passable at summarizing chapters, listing characters, major theme and so on, and it can do this in English.
    "What is the symbolism of the Black Stone?"
    > In the sources, the Black Stone (referred to as the Pierre-Noire) serves as the central religious icon of the Syrian Sun cult brought to Rome by the Emperor Elagabalus. Its symbolism is multifaceted, ranging from fertility and virility to a mystical pursuit of universal unity.
    > It represents the perennity of the Sun, which fertilizes the world, causing "germs to rise" and spreading them through the atmosphere. It is viewed as the "definitive form" of divinity, intended to absorb and replace the "transitory forms" of all other Roman, Greek, Egyptian, and Persian gods, including the Christian "Kreistos". > Virility and Phallic Symbolism > > The sources explicitly characterize the Black Stone as a gigantic sacred phallus. It is described as: • An icon of virility and the "organ of generation" • A "unisexual icon" that materializes the generative force of nature • A representation of "virility in activity," which is why it is often paraded and elevated in an "orgasmic" or "colossal adoration"
    > The Androgyne and Universal Unity - Beyond simple fertility, the philosopher Atillius explains a more complex, "mad" metaphysical project associated with the stone. It symbolizes "Life One" (Vie Une) and the return to a unisexual state of perfection. • The Androgyne: Atillius believes that by pursuing "the male sex by the male sex," the cult "inutilizes" the female sex to eventually create the Androgyne—a self-sufficient being containing both sexes • Unity: The stone signifies the fusion of all generative forces into a single Unity, reversing the "separation of the sexes" which is viewed as a state of unhappiness and impotence. • Marriage of Moon and Sun: The ritual marriage of the goddess Astaroth (representing the Moon and the female principle) to the Black Stone (representing the Sun and the male principle) symbolizes the merging of the Orient and Occident into this unified life principle > > Destruction of the Symbol - The Black Stone ultimately becomes a symbol of Oriental pollution and decadence to the Roman populace. During the final rebellion against Elagabalus, the stone is torn from its temple on the Palatine, defiled with filth, and broken into pieces to ensure that its "signification of Life" would never again dominate Rome.
    This is all accurate to the book, even teasing out a couple themes that were only subconsciously present to me.
    The NotebookLM version gives citations with links to the original text to support all these assertions, which largely are coherent with that purpose.
    The input is raw images of a book scan! Imperfect as it is it still blows my mind. Not that long ago any kind of semantic search or analysis was a very hard AI problem.
    daveguy25 days ago
    "quite obscure" doesn't mean there is nothing in the internet that directly addresses the question.
    Here is an english analysis of the text that easily showed up in an internet search:
    https://www.cantab.net/users/leonardo/Downloads/Varian%20Sym...
    This source includes analysis of "the Black Stone."
    retrac25 days ago
    Not quite the same analysis. The human is better, no surprise. But the NotebookLM output links back to the original book in a very useful way. If you think about it as fuzzy semantic search it's amazing. If you want an essay or even just creativity, yes it's lacking.
    daveguy25 days ago
    It doesn't have to be the same analysis to put it in a partially overlapping vector space. Not saying it wasn't a useful perspective shuffling in the vector space, but it definitely wasn't original.
    LLMs haven't solved any of the 2029 predictions as they were posited. But I expect some will be reached by 2029. The AI hype acts like all this is easy. Not by 2029 doesn't mean impossible or even most of the way there.
    Workaccount224 days ago
    LLMs will never achieve anything as long as any victory can be hand waved away with "in the training set". Somehow these models have condensed the entire internet down to a few TB's, yet people aren't backing up their terabytes of personal data down to a couple MB using this same tech...wonder why
    daveguy24 days ago
    It wasn't a hand wave. I gave an exact source, which OP admitted was better.
    They certainly haven't "condensed the entire internet into a few TBs". People aren't backing up their personal data to a few MB because your assumption is false.
    Maybe when people stop hand waving abilities that aren't there we will better understand their use as a tool and not magic.
    suddenlybananas24 days ago
    Surely there is analysis available online in French though?
  - stingrae25 days ago
    1 and 2 have been achieved.
    4 is close, the interface needs some work to allow nontechnical people use it. (claude code)
    fxtentacle25 days ago
    I strongly disagree. I’ve yet to find an AI that can reliably summarise emails, let alone understand nuance or sarcasm. And I just asked ChatGPT 5.2 to describe an Instagram image. It didn’t even get the easily OCR-able text correct. Plus it completely failed to mention anything sports or stadium related. But it was looking at a cliche baseball photo taken by an fan inside the stadium.
    protocolture25 days ago
    I have had ChatGPT read text in an image, give me a 100% accurate result, and then claim not to have the ability and to have guessed the previous result when I ask it to do it again.
    pixl9724 days ago
    >let alone understand nuance or sarcasm
    I'm still trying to find humans that do this reliably too.
    To add on, 5.2 seems to be kind of lazy when reading text in images by default. Feeding it an image it may give the first word or so. But coming back with a prompt 'read all the text in the image' makes it do a better job.
    With one in particular that I tested I thought it was hallucinating some of the words, but there was a picture in the picture with small words it saw I missed the first time.
    I think a lot of AI capabilities are kind of munged to end users because they limit how much GPU is used.
    atomic_reed24 days ago
    [dead]
    24 days ago
    undefined
    falloutx25 days ago
    I dispute 1 & 2 more than 4.
    1) Is it actually watching a movie frame by frame or just searching about it and then giving you the answer?
    2) Again can it handle very long novels, context windows are limited and it can easily miss something. Where is the proof for this?
    4 is probably solved
    4) This is more on predictor because this is easy to game. you can create some gibberish code with LLM today that is 10k lines long without issues. Even a non-technical user can do
    CjHuber25 days ago
    I think all of those are terrible indicators, 1 and 2 for example only measure how well LLMs can handle long context sizes.
    If a movie or novel is famous the training data is already full of commentary and interpretations of them.
    If its something not in the training data, well I don't know many movies or books that use only motives that no other piece of content before them used, so interpreting based on what is similar in the training data still produces good results.
    EDIT: With 1 I meant using a transcript of the Audio Description of the movie. If he really meant watch a movie I'd say thats even sillier because well of course we could get another Agent to first generate the Audio Description, which definitely is possible currently.
    zdragnar25 days ago
    Just yesterday I saw an article about a police station's AI body cam summarizer mistakenly claim that a police officer turned into a frog during a call. What actually happened was that the cartoon "princess and the frog" was playing in the background.
    Sure, another model might have gotten it right, but I think the prediction was made less in the sense of "this will happen at least once" and more of "this will not be an uncommon capability".
    When the quality is this low (or variable depending on model) I'm not too sure I'd qualify it as a larger issue than mere context size.
    CjHuber25 days ago
    My point was not that those video to text models are good like they are used for example in that case, but more generally I was referring to that list of indicators. Like surely when analysing a movie it is alright if some things are misunderstood by it, especially as the amount of misunderstanding can be decreased a lot. That AI body camera surely is optimized on speed and inference cost. but if you give an agent 10 1s images along with the transcript of that period and the full prior transcript, and give it reasoning capabilities, it would take almost endlessy for that movie to process but the result surely will be much better than the body cameras. After all the indicator talks about "AI" in general so judge a model not optimized for capability but something else to measure on that indicator
- zozbot23425 days ago
  > In 2029, AI will not be able to read a novel and reliably answer questions about plot, character, conflicts, motivations, etc. Key will be going beyond the literal text, as Davis and I explain in Rebooting AI.
  Can AI actually do this? This looks like a nice benchmark for complex language processing, since a complete novel takes up a whole lot of context (consider War and Peace or The Count of Monte Cristo). Of course the movie variety is even more challenging since it involves especially complex multi-modal input. You could easily extend it to making sense of a whole TV series.
  - idreyn25 days ago
    Yes. I am a novelist and I noticed a step change in what was possible here around Claude Sonnet 3.7 in terms of being able to analyze my own unpublished work for theme, implicit motivations, subtext, etc -- without having any pre-digested analysis of the work in its training data.
    alextingle24 days ago
    How do you get a novel sized file into Claude? I've tried, and it always complains it's too long.
    idreyn24 days ago
    My word count has hovered around 100k for most of my three years of writing and revising. This does sometimes run up against limits on Claude (or recently, with Opus 4.5, compaction) but in the past the whole thing has fit just fine as a plain text file.
  - colechristensen25 days ago
    >Can AI actually do this? This looks like a nice benchmark for complex language processing, since a complete novel takes up a whole lot of context (consider War and Peace or The Count of Monte Cristo)
    Yes, you just break the book down by chapters or whatever conveniently fits in the context window to produce summaries such that all of the chapter summaries can fit in one context window.
    You could also do something with a multi-pass strategy where you come up with a collection of ideas on the first pass and then look back with search to refine and prove/disprove them.
    Of course for novels which existed before the time of training an LLM will already contain trained information about so having it "read" classic works like The Count of Monte Cristo and answer questions about it would be a bit of an unfair pass of the test because models will be expected to have been trained on large volumes of existing text analysis on that book.
    >reliably answer questions about plot, character, conflicts, motivations
    LLMs can already do this automatically with my code in a sizable project (you know what I mean), it seems pretty simple to get them to do it with a book.
    littlestymaar24 days ago
    > Yes, you just break the book down by chapters or whatever conveniently fits in the context window to produce summaries such that all of the chapter summaries can fit in one context window.
    I've done that a few month ago and in fact doing just this will miss cross-chapter informations (say something is said in chapter 1, that doesn't appears to be important but reveals itself crucial later on, like "Chekhov's gun").
    Maybe doing that iteratively several time would solve the problem, I run out of time and didn't try but the straightforward workflow you're describing doesn't work so I think it's fair to say this challenge isn't solve. (It works better with non-fiction though, because the prose is usually drier and straight to the point).
    blharr24 days ago
    in that case, why not summarize the previous chapters and then include that as context to the next chapter?
    littlestymaar24 days ago
    That's what I did, but the thing is the LLM has no way to know what details are important in the first chapter before seeing their importance in the later chapters, and so these details usually get discarded by the summarization process.
    colechristensen24 days ago
    Which is why you go back and re-summarize a second time given the context of the important details found out in the first pass.
    littlestymaar24 days ago
    Maybe it would work, as I said, but that wouldn't fit the original challenge description anymore IMO.
    colechristensen23 days ago
    Models can do this all by themselves after a single human interaction, I think that fits
  - the-grump25 days ago
    Yes they can. The size of many codebases is much larger and LLMs can handle those.
    Consider also that they can generate summaries and tackle the novel piecemeal, just like a human would.
    Re: movies. Get YouTube premium and ask YouTube to summarize a 2hr video for you.
    falloutx25 days ago
    Novel is different from a codebase. In code you can have a relationship between files and most files can be ignored depending on what you're doing. But for a novel, its a sequential thing, in most cases A leads to B and B leads to C and so on.
    > Re: movies. Get YouTube premium and ask YouTube to summarize a 2hr video for you.
    This is different from watching a movie. Can it tell what suit actor was wearing? Can it tell what the actor's face looked like? Summarising and watching are too different things.
    pigpop24 days ago
    Yes, it is possible to do those things and there are benchmarks for testing multimodal models on their ability to do so. Context length is the major limitation but longer videos can be processed in small chunks whose descriptions can be composed into larger scenes.
    https://github.com/JUNJIE99/MLVU
    https://huggingface.co/datasets/OpenGVLab/MVBench
    Ovis and Qwen3-VL are examples of models that can work with multiple frames from a video at once to produce both visual and temporal understanding
    https://huggingface.co/AIDC-AI/Ovis2.5-9B
    https://github.com/QwenLM/Qwen3-VL
    cmcaleer25 days ago
    You’re moving the goalposts. Gary Marcus’ proposal was being able to ask: Who are the characters? What are their conflicts and motivations? etc.
    Which is a relatively trivial task for a current LLM.
    daveguy25 days ago
    The Gary Marcus proposal you refer to was about a novel, and not a codebase. I think GP's point is that motivations require analysis outside of the given (or derived) context window, which LLMs are essentially incapable of doing.
  - postalrat25 days ago
    No human reads a novel and evaluates it as a whole. It's a story and the readers perception changes over the course of reading the book. Current AI can certainly do that.
    jhanschoo24 days ago
    > It's a story and the readers perception changes over the course of reading the book.
    You're referring to casual reading, but writers and people who have an interest and motivation to read deeply review, analyze, and summarize books under lenses and reflect on them; for technique as much as themes, messages, how well they capture a milieu, etc. So that's quite a bit more than "no human"!
- ls61225 days ago
  I'm pretty sure it can do all of those except for the one which requires a physical body (in the kitchen) and the one that humans can't do reliably either (construct 10000 loc bug-free).
- colechristensen25 days ago
  Besides being a cook which is more of a robotics problem all of the rest are accomplished to the point of being arguable about how reliably LLMs can perform these tasks, the arguments being between the enthusiast and naysayer camps.
  The keyword being "reliably" and what your threshold is for that. And what "bug free" means. Groups of expert humans struggle to write 10k lines of "bug free" code in the absolutist sense of perfection, even code with formal proofs can have "bugs" if you consider the specification not matching the actual needs of reality.
  All but the robotics one are demonstrable in 2026 at least.
- staticman223 days ago
  I don't understand how this claim can even be tested:
  > In 2029, AI will not be able to read a novel and reliably answer questions about plot, character, conflicts, motivations, etc. Key will be going beyond the literal text, as Davis and I explain in Rebooting AI.
  Once you are "going beyond the literal text" the standard is usefulness of your insight about the novel, not whether your insight is "right" or "wrong".
- thethirdone25 days ago
  Which ones of those have been achieved in your opinion?
  I think the arbitrary proofs from mathematical literature is probably the most solved one. Research into IMO problems, and Lean formalization work have been pretty successful.
  Then, probably reading a novel and answering questions is the next most successful.
  Reliably constructing 10k bug free lines is probably the least successful. AI tends to produce more bugs than human programmers and I have yet to meet a programmer who can reliably produce less than 1 bug per 10k lines.
  - zozbot23425 days ago
    Formalizing an arbitrary proof is incredibly hard. For one thing, you need to make sure that you've got at least a correct formal statement for all the prereqs you're relying on, or the whole thing becomes pointless. Many areas of math ouside of the very "cleanest" fields (meaning e.g. algebra, logic, combinatorics etc.) have not seen much success in formalizing existing theory developments.
  - kleene_op25 days ago
    > Reliably constructing 10k bug free lines is probably the least successful.
    You imperatively need to try Claude Code, because it absolutely does that.
    thethirdone25 days ago
    I have seen many people try to use Claude Code and get LOTS of bugs. Show me any > 10k project you have made with it and I will put the effort in to find one bug free of charge.
- dyauspitr24 days ago
  In my opinion, contrary to other comments here I think AI can do all of the above already except being a kitchen cook.
  Just earlier today I asked it to give me a summary of a show I was watching until a particular episode in a particular season without spoiling the rest of it and it did a great job.
  - suddenlybananas24 days ago
    You know that almost every show as summaries of episodes available online?
    joquarky24 days ago
    How do you find them?
- raincole24 days ago
  > Many of these have already been achieved, and it's only early 2026.
  I'm quite sure people who made those (now laughable) predictions will tell you none of these has been achieved, because AI isn't doing this "reliably" or "bug-free."
  Defending your predictions is like running an insurance company. You always win.
- jgalt21225 days ago
  This comment or something very close always appears alongside a Gary Marcus post.
  - raincole24 days ago
    And why not? Is there any reason for this comment to not appear?
    If Bill Gates made a predication about computing, no matter what the predication says, you can bet that 640K memory quote would be mentioned in the comment section (even he didn't actually say that).
    jgalt21224 days ago
    becuase
    - it's tiresome
    - and the only less useful than making predictions is making predictions about predictions.
  - GorbachevyChase24 days ago
    I think it’s for good reason. I’m a bit at a loss as to why every time this guy rages into the ether of his blog it’s considered newsworthy. Celebrity driven tech news is just so tiresome. Marcus was surpassed by others in the field and now he’s basically a professional heckler on a university payroll. I wish people could just be happy for the success of others instead of fuming about how so and so is a billionaire and they are not.
  - margalabargala25 days ago
    Which is fortunate, considering how asinine it is in 2026 to expect that none of the items listed will be accomplished in the next 3.9 years.
daedrdev25 days ago
This post is literally just 4 screenshots of articles, not even its own commentary or discussion.
- laughingcurve25 days ago
  Don’t be too harsh, it’s the most effort Gary has put into his criticism in a while </s>
  I appreciate good critique but this is not it
saberience25 days ago
Gary Marcus (probably): "Hey this LLM isn't smarter than Einstein yet, it's not going all that well"
The goalposts keep getting pushed further and further every month. How many math and coding Olympiads and other benchmarks will LLMs need to dominate before people will actually admit that in some domains it's really quite good.
Sure, if you're a Nobel prize winner or PhD then LLMs aren't as good as you yet, but for 99% of the people in the world, LLMs are better than you at Math, Science, Coding, and every language probably except your native language, and it's probably better at you at that too...
didibus25 days ago
Ignoring the actual poor quality of this write-up, I think we don't know how well GenAI is going to be honest. I feel we've not been able to properly measure or assess it's actual impact yet.
Even as I use it, and I use it everyday, I can't really assess its true impact. Am I more productive or less overall? I'm not too sure. Do I do higher quality work or lower quality work overall? I'm not too sure.
All I know, it's pretty cool, and using it is super easy. I probably use it too much, in a way, that it actually slows things down sometimes, when I use it for trivial things for example.
At least when it comes to productivity/quality I feel we don't really know yet.
But there are definite cool use-cases for it, I mean, I can edit photos/videos in ways I simply could not before, or generate a logo for a birthday party, I couldn't do that before. I can make a tune that I like, even if it's not the best song in the world, but it can have the lyrics I want. I can have it extract whatever from a PDF. I can have it tell me what to watch out for in a gigantic lease agreement I would not have bothered reading otherwise.
I can have it fix my tests, or write my tests, not sure if it saves me time, but I hate doing that, so it definitely makes it more fun and I can kind of just watch videos at the same time, what I couldn't before. Coding quality of life improvements are there too, I want to generate a sample JSON out of a JSONSchema, and so on. If I want, I can write the a method using English prompts instead of the code itself, might not truly be faster or not, not sure, but sometimes it's less mentally taxing, depending on my mood, it can be more fun or less fun, etc.
All those are pretty awesome wins and a sign that for sure those things will remain and I will happily pay for them. So maybe it depends on what you expected.
- sheeh25 days ago
  And what do you think investors in OAI et al are expecting?
  - didibus23 days ago
    I think there is a hype around it becoming revolutionary and so on, but I also think investors would get an decent ROI even if it just ends up that 50+ million of users pay 20$ a month, on top of some enterprise contracts and API fees and so on. Or the inevitable Ad-supported access.
    In my opinion, it's already useful enough, given the use-cases I described, to reach that level.
nojvek24 days ago
Gary Marcus again. The chief doomer of AI where goal posts keep on moving.
Almost everyone around me, even the primary school kids use ChatGPT/Perplexity/Gemini/Claude in some form on almost a daily basis. The daily engagement is v strong.
The models keep improving every year. Nano banana gets text spot on, human anatomy of digits and toes is spot on. Deep Research mode is mind boggling. All the major vendors have some form of voice interaction, and it feels pretty good. I use perplexity talk feature while driving to learn deep about a topic of interest.
The trend is strong, betting against the trend isn't wise.
I can paste entire books and ask questions about certain pieces. The context windows nowadays are wild.
Price per token keeps on dropping, more capability keeps on coming online.
Gary offers no solutions, just complaints.
thechao25 days ago
You're absolutely right!
The irony of a five sentence article making giant claims isn't lost on me. Don't get me wrong: I'm amenable to the idea; but, y'know, my kids wrote longer essays in 4th grade.
mythrwy25 days ago
It's going well for coding. I just knocked out a mapping project that would have been a week+ of work (with docs and stackoverflow opened in the background) in a few hours.
And yes, I do understand the code and what is happening and did have to make a couple of adjustments manually.
I don't know that reducing coding work justifies the current valuations, but I wouldn't say it's "not going all that well".
emp1734425 days ago
Guessing this isn’t going to be popular here, but he’s right. AI has some use cases, but isn’t the world-changing paradigm shift it’s marketed as. It’s becoming clear the tech is ultimately just a tool, not a precursor to AGI.
- teej25 days ago
  Is that the claim the OP is making?
- avaer25 days ago
  If AGI is ever going to happen, then it's definitionally a precursor to it.
  So I'm not really sure how to parse your statement.
  - alex_young25 days ago
    I’m not sure I follow. What if LLMs are helpful but not useful to AGI, but some other technology is? Seems likely.
    avaer25 days ago
    The comment wasn't referencing LLMs, but generative AI.
    Even then, given the deep impact of LLMs and how many people are using them already, it's a stretch to say LLMs will have no effect on the development of AGI.
    I think it's pretty obvious that AGI requires something more than LLMs, but I think it's equally obvious LLMs will have been involved in its development somewhere, even if just a stepping stone. So, a "precursor".
- sajithdilshan25 days ago
  not YET.
sghiassy25 days ago
LLMs help me read code 10x faster - I’ll take the win and say thanks
smashed25 days ago
Should have used an LLM to proofread.. LLMs can still cannot be trusted?
- warkdarrior25 days ago
  How dare you accuse Gary-Marcus-5.2-2025-12-11 of being an LLM??
pj453324 days ago
All I know is that I have built more in the past 10 months than I ever have. How do you quantify for the skeptics the mental shift that happens when you know you can just build stuff now?
COULD I do this stuff before? Sure. But I wouldn’t have. Life gets in the way. Now, the bar is low so why not build stuff? Some of it ships, some of it is just experimentation. It’s all building.
Trying to quantify that shift is impossible. It’s not a multiplier to productivity you measure by commits. It’s a builder mind shift.
- sjw98724 days ago
  "I have built more in the past 10 months than I ever have."
  Correction. The genAI has built it.
  I haven't got any skin on either side here, but doesn't the fact the genAI can build it imply that what you are doing is heavily trodden ground, that there will be less and less need for developers like you, and will gradually lead to many developers (like you) being cut out of the market entirely.
  For personal stuff it's wonderful. For work, it seems like a double edged sword that will eventually cut the devs that use it (and those that don't). Even if the business owners aren't completely daft and keep a (vastly diminished) workforce of dev/AI consultants on board, that could easily exclude you or me.
  It's going well if all the jobs it eradicates can be replaced with just as many jobs (they can't), or the powers that be catch on and realise there isn't that many jobs left for humans to do and institute some form of basic income system (they won't).
  - pj453324 days ago
    "The genAI has built it" -- this is the core point. If I did nothing except complain about AI for the past 10 months, would these projects exist? No they would not. So. I. Built. It.
    If you actually use these tools, really use them. You realize that it's an augmentation not a replacement. Simply because the training data is what has already come before (for now!). The LLMs need help, direction, focus...and those are learned skills dependent on the tooling. Not to mention ideas.
    And sure, I imagine the software development workforce will change quite a bit, probably this year, no doubt about that.
    But the need for builders will not change. I imagine that the 'builder' role will change to be traditional software developers, designers, sales people, writers, c-suite...whatever.
    So I think you are right. "That could easily exclude you or me". 100% correct. The required skill set to be a builder is changing on a weekly basis. The only way to keep up is to keep building with these tools. Trying things. Experimenting. Otherwise, yes, you will probably be replaced. By a builder.
  - joquarky24 days ago
    > For work, it seems like a double edged sword that will eventually cut the devs
    Developers have been putting non-developers out of a job for decades.
anarticle25 days ago
Download models you can find now and forever. The guardrails will only get worse, or models banned entirely. Whether it's because of "hurts people's health" or some other moral panic, it will kill this tech off.
gpt-oss isn't bad, but even models you cannot run are worth getting since you may be able to run them in the future.
I'm hedging against models being so nerfed they are useless. (This is unlikely, but drives are cheap and data is expensive.)
- joquarky24 days ago
  Guardrails are just adjustable parameters. The trick is finding the right ones and turning them off.
  I forget the name, but there is at least one project dedicated to this.
  - anarticle22 days ago
    Here's the thing: this is like 0days, the bar gets higher and higher on served models from OpenAI/Microsoft/Google. Sure, some have api access and you can allow spicier content but a better answer is: "tl;dr: not reading that."
    I'm tired of having to step around megacorp's opinions on how I should be running my software, what questions to ask. gpt5.2 is HELLA paternalistic when you ask for deep dives on technical questions. Try asking something near to state of the art and see how far you get, most decisions come with huge warnings about how "this idea won't work" etc.
    This deviates from my expectation, which is a fun word calculator to help me with back of the napkin calculations quickly so I can do other things. If that's not the core use case, fine. But that is a flag to me that says this will continue to be a moving target, which is the OPPOSITE of what a tool is.
    There are some projects to reverse alignment and other things, and those people are doing god's work for sure. For me, I want to be sure I have the ability to run these things on my own because I don't trust that this won't mutate into something not useful in my way on the long term.
starchild300119 days ago
How to read this brilliant blog post by Gary Marcus?
- Replace all the occurances of "LLM" by "human";
- Replace all the occurences of "Scaling" by "additional education".
Voila! You get an article that actually makes sense, plus you'll get a better sense of where the technology is -- these models are behaving much like humans across many tasks. They aren't perfect. But they are getting better everyday, and are quite useful.
Thank you AI developers and researchers for making progress everyday! No "thank you" to people like Gary Marcus who'd be called a "perma bear" in financial parlance.
kennyadam24 days ago
I’ve been using Claude Code, Gemini 3 Pro, and Nano Banana Pro to plan, code, and create custom UI elements for dozens of time-saving applications. For years, I have been searching high and low for existing solutions, but all I found were either overpriced cloud offerings that were bloated with endless features I didn’t need and just complicated the UI, or abandoned GitHub repos consisting of an initial commit and a roadmap that has been waiting eight years for its first update and what code was present was half baked and out of date. The reality is that my requirements are so specific to my workflow that until these latest models came along, building exactly what I needed in a matter of hours for a cost of $20 a month was inconceivable. Now I provide a description of what functionality I need, some sketches of the UI I made on my ipad with an apple pencil and after a bit of back and forth to get everything dialled in and I’ve created a bit of software that will save me dozens if not hundreds of hours of previously tedious manual work.
jaffee25 days ago
What a joke this guy is. I can sit down and crank out a real, complex feature in a couple hours that would have previously taken days and ship it to the users of our AI platform who can then respond to RFQs in minutes where they would have previously spent hours matching descriptions to part numbers manually.
...and yet we still see these articles claiming LLMs are dying/overhyped/major issues/whatever.
Cool man, I'll just be over here building my AI based business with AI and solving real problems in the very real manufacturing sector.
- joquarky24 days ago
  I'm in a similar situation.
  Sometimes, I suspect that half the naysayers are just trolling for workable ideas; i.e., they want proof of someone's success so they can copy it.
moonshotideas24 days ago
How long do you think it will be until the “ai isn’t doing anything” people are going away 1 month, 6 months, I’d say 1 Year at the most, anyone who has used Claude code since Dec 1st knows this in their bones, so I’d just let these people shout from the top of the hill until they run out of steam…
Right around then, we can send a bunch of reconnaissance teams out to the abandoned Japanese islands to rescue them from the war that’s been over for 10 years - hopefully they can rejoin society, merge back with reality and get on with their lives
- joquarky24 days ago
  I think the "AI isn't doing anything" crowd have some kind of vocabulary/language barriers/deficiencies that prevent them from refining their prompting methods into something that works for them.
  I find that the more precise I am in my prompts, the more precise the response. But that requires that I use vocabulary that I wouldn't use in a human conversation.
  - moonshotideas15 days ago
    Same here, I created a prompt enhancer gpt and a prose enhancer gpt, and I tend to chain all my prompts through them, then I use an extension to remove markdown + replace Unicode, and then add tabs and proper formatting to product a final version of all my prompts. This tends to result in prompts that perform 20-25% better for all difficult or multi-part tasks.
    Logically this makes sense, the probabilites for next tokens the model produces follows the pattern it observes from the initial input, if your prose reflects that which individuals with higher intelligence tend to use, the model will continue this high in response and vice versa
    I’m curious to see what happens as the % of training input using synthetic data generated by models tips the scales so that markdown reflects higher intelligence inputs - I wonder when this will occur
m46325 days ago
I see stuff like this and think of these two things:
1) https://en.wikipedia.org/wiki/Gartner_hype_cycle
or
2) "First they ignore you, then they laugh at you, then they fight you, then you win."
or maybe originally:
"First they ignore you. Then they ridicule you. And then they attack you and want to burn you. And then they build monuments to you"
- joquarky24 days ago
  > "First they ignore you, then they laugh at you, then they fight you, then you win."
  The only people who use this quote sincerely have been crackpots, in my experience.
herunan25 days ago
First of all, popping in a few screenshots of articles and papers is not proper analysis.
Second of all, GenAI is going well or not depending on how we frame it.
In terms of saving time, money and effort when coding, writing, analysing, researching, etc. It’s extremely successful.
In terms of leading us to AGI… GenAI alone won’t reach that. Current ROI is plateauing, and we need to start investing more somewhere else.
rpowers25 days ago
I keep reading comments that claim GenAI's positive traits, but this usually amounts to some toy PoC that very eerily mirrors work found in code bootcamps. You want an app that has logins and comments and upvotes? GenAI is going to look amazing setting up a non-relational db to your node backend.
- demorro24 days ago
  Aye. If you've not turned a real profit with your thing, I will default to believing that you don't know what you're talking about and are probably building toys.
  It's nothing to do with AI. I didn't believe "I rewrote my application in three weeks!" claims before AI, and I don't believe them now. Most people are not able to evaluate themselves, I don't see why that would have changed.
siscia24 days ago
I think that the wider industry is living right now what was coding and software engineering around 1 year or so ago.
Yeah you could ask ChatGPT or Claude to write code, but it wasn't really there.
It needs a while to adopt the model AND the UI. As in software are the first one because we are both makers and users.
dkobia24 days ago
Preaching to the wrong choir. The HN community is reaping massive benefits from generative AI.
Jadiiee25 days ago
It's more about how you use it. It should be a source of inspo. Not the end all be all.
fortyseven24 days ago
I've just started ignoring people like this. You think everything's going bad? Okay fine. You go ahead and keep believing that. Maybe you could get it printed on a sandwich board and walk up and down the street with it.
afspear25 days ago
Meanwhile I'm over here reducing my ADO ticket time estimates by 75%.
mrbluecoat25 days ago
> LLMs can still cannot be trusted
But can they write grammatically correct statements?
- efilife24 days ago
  This was the first thing I noticed too. This is the most low effort post I have ever seen that high up on hacker news
unwise-exe24 days ago
Meanwhile $employer is continuing to migrate individual tasks to in-house AI tooling, and has licensed an off-the-shelf coding agent for all of us developers to put in our IDEs.
robertclaus25 days ago
Odds this was AI generated?
- kingstnap25 days ago
  It's literally just four screenshots paired with this sentence.
  > Trying to orient our economy and geopolitical policy around such shoddy technology — particularly on the unproven hopes that it will dramatically improve– is a mistake.
  The screenshots are screenshots of real articles. The sentence is shorter than a typical prompt.
amw-zero25 days ago
I’m starting to think this take is legitimately insane.
As said in the article, a conservative estimate is that Gen AI can currently do 2.5% of all jobs in the entire economy. A technology that is really only a couple of years old. This is supposed to be _disappointing_? That’s millions of jobs _today_, in a totally nascent form.
I mean I understand skepticism, I’m not exactly in love with AI myself, but the world has literally been transformed.
blindriver25 days ago
This entire take is nonsense.
I just used ChatGPT to diagnose a very serious but ultimately not-dangerous health situation last week and it was perfect. It literally guided me perfectly without making me panic and helped me understand what was going on.
We use ChatGPT at work to do things that we have literally laid people off for, because we don't need them anymore. This included fixing bugs at a level that is at least E5/senior software engineer. Sometimes it does something really bad but it definitely saves times and helps avoid adding headcount.
Generative AI is years beyond what I would have expected even 1 year ago. This guy doesn't know what he's talking about, he's just picking and choosing one-off articles that make it seem like it's supporting his points.
25 days ago
undefined
tom_m23 days ago
It's going well in terms of being a valuable tool. It's not going well from an economic point of view. There's going to be winners and losers in this bubble. Things will settle and it will be commonplace technology in the future. Not going anywhere. It's just over hyped right now.
Then you consider the massive spend in data centers, the ram shortage, etc. The writing is on the wall.
meowface25 days ago
How on Earth do people keep taking Gary Marcus seriously?
- throw31082225 days ago
  He's such a joke that even LLMs make fun of him. The Gemini-generated Hacker News frontpage for December 9 2035 contains an article by Gary Marcus: "AI progress is stalling": https://dosaygo-studio.github.io/hn-front-page-2035/news
- 25 days ago
  undefined
- piskov25 days ago
  As if the articles he’s linked were written by him
bawolff25 days ago
Holy moving goal posts batman!
I hate generative AI, but its inarguable what we have now would have been considered pure magic 5 years ago.
wewewedxfgdf25 days ago
Haters gonna hate.
joshcsimmons24 days ago
Huh?
Seems like black and white thinking to me. I had it make suggestions for 10 triage issues for my team today and agreed with all of its routings. That’s certainly better than 6 months ago.
billsunshine25 days ago
a historic moron. Marcus will make Krugman's internet==fax machine look like a good prediction
w4yai25 days ago
[flagged]
segfaultex25 days ago
I wholeheartedly agree. Shitty companies steal art and then put out shitty products that shitty people use to spam us with slop.
The same goes for code as well.
I’ve explored Claude code/antigravity/etc, found them mostly useless, tried a more interactive approach with copilot/local models/ tried less interactive “agents”/etc. it’s largely all slop.
My coworkers who claim they’re shipping at warp speed using generative AI are almost categorically our worst developers by a mile.
- 478262629228324 days ago
  Ah, Gary Marcus, the 10x ninja whose hand-crafted bespoke code singlehandedly keeps his employer in business.
  - segfaultex24 days ago
    That’s not what I’m suggesting at all.
sublinear24 days ago
All this AI discussion has done is reveal how naive some people are.
You're not losing your job unless you work on trivial codebases. There's a very clear pattern what those are: startups, greenfield, games, junk apps, mindless busywork that probably has an existing better tool on github, etc. Basically anything that doesn't have any concrete business requirements or legal liability.
This isn't to say those codebases will always be trivial, but good luck cleaning that up or facing the reality of having to rewrite it properly. At least you have AI to help with boilerplate. Maybe you'll learn to read docs along the way.
The people claiming to be significantly more productive are either novice programmers or optimistic for unexplained reasons they're still trying to figure out. When they want to let us know, most people still won't care because it's not even the good kind of unreasonable that brings innovation.
The only real value in modern LLMs is that natural language processing is a lot better than it used to be.
Are we done now?