143 pointsby swolpers6 hours ago34 comments
  • eithed4 hours ago
    What I find fascinating that there is so little substance in this article about the quality of produced code and the medium. Is the code documented and tested? Is it understandable and extendable? Is it secure? What language, framework, database was used? Author mentions judgement and taste - well, is the code tasteful? Will the model rearchitecture the entire thing if I ask it to add new functionality, spending another 9.5h in tokens? I assume that the research part is domain knowledge = how different types of travel translate to time making it presentable; how did the author verify this?

    These questions are even not about AI: if I were to give money to a human agency and were given something they tell me works, I would ask the same questions. If I did not know how to evaluate, I would hire people that do. With LLMs the verification part is what bothers me the most.

    • an0malous2 hours ago
      These posts are never written by software engineers, it’s always some tech exec, retired engineer, or VC. This author is apparently a professor at the Wharton School of Management? None of these people have to ship or maintain real products, they’re just making side projects.

      The only decent software engineering perspective I’ve seen has been from Mitchell Hashimoto.

      • jimbokun2 hours ago
        Well that’s kind of the point.

        They can just summon bespoke software out of the ether that only handles the use cases of themselves and a few of their collaborators.

        Making “side projects” was mot possible for non-developers before powerful LLMs. Now it is.

        • an0malousan hour ago
          I don’t think that’s true, I think these authors are making a much stronger claim that AI is proficient or even an expert at software engineering. This author describes how complex and sophisticated their software is, and the only value he’ll concede to “coders” is that there might be a few bugs they’d need to fix.

          Imagine not being an architect and using Claude to put together a building plan, then concluding it’s basically done but we might need a real architect to double check the measurements. It may even be true but I’d be skeptical if it’s always non-architects saying this.

        • shimmanan hour ago
          Making side projects isn't a trillion dollar industry tho, adding to the fact that we are facing another global supply chain crisis due to the Iran War; the US is about to commit the biggest self-own ever in the history of empire.
    • cgearhart3 hours ago
      I’m starting to realize that LLMs are really good at building low-stakes projects. Your questions mostly presume that the stakes are higher. The software will last a long time; the requirements will evolve; we can’t tolerate mistakes; etc.

      The trick to getting good at using LLMs for software is to learn how to make _all_ projects low-stakes.

      • qaq3 hours ago
        You don't need LLM for that. You make _all_ projects low-stakes by working on green field project using (insert buzzword soup of the day) and leaving for a new green field opportunity (that requires experience with buzzword soup of the day) before the project ships.
      • rpdillonan hour ago
        This is really insightful, but I think it also extends to making the project either low stakes or low complexity. I have this lurking feeling that the preferable architecture for software will change as a result of LLMs because they're good at working on low complexity modular components more than they are on high complexity million-line code bases.
      • acedTrex3 hours ago
        > The trick to getting good at using LLMs for software is to learn how to make _all_ projects low-stakes.

        this doesn't really work in the real world. There are many things that actually matter, engineering is fundamentally about handling them.

    • hypfer4 hours ago
      Being the first to release an article gives you great SEO or whatever. Doing the things you've mentioned takes time.
    • coldtea2 hours ago
      >What I find fascinating that there is so little substance in this article about the quality of produced code and the medium.

      I clicked one of his examples intrigued "a snake game where the snake is self-aware and crazy things happen;". Played for 1-2 minutes, and it's the classic 1980s snake game. Am I missing something? What is "self-aware" about it? Some funny messages at the bottom of the screen? And what are the "crazy things"?

      • starshadowx2an hour ago
        It sounds like you either didn't play enough or you are missing the new mechanics that get added over time. There's definitely more to it than just regular snake.
      • vunderba2 hours ago
        I had the exact same thought. To me, it feels like they just took the fairly common “sentient video game character” trope and bolted it onto a very conventional snake game.

        I will say, the act of eating creates a "bulge distortion" that flows down the length of the snake is a nice touch though.

    • chickensong41 minutes ago
      You probably don't care about the ingredients or engineering of asphalt, only if the road does its job well or is filled with potholes. Outside of the software industry, nobody gives a shit about code or databases.
      • eithed31 minutes ago
        I agree. But if I'm paying for the road (even as a taxpayer) I get angry that after a year it's full of potholes and that there are unnecessary signs warning about penguin crossing, making it cost 2 times more than it should have (and dont get me started why this road is really a highway leading to my house). I'd want certain qualities. And this article is basically = you will get a road, built quickly

        But yes, you are right - I don't build roads and don't know what is a price to build a road and how to determine the quality of correctly built one, nor I will ever care or learn.

    • jstummbillig3 hours ago
      Less fascinating when you consider that this is a non-coders perspective.
      • eithed3 hours ago
        Fair enough, but enterpreunership should, I guess, ask questions if given Next Big Thing has substance behind it or is it just snake oil.
        • munk-a3 hours ago
          Ah, but billions of dollars depend on those questions not being asked in a genuine manner. Don't you want a slice of that or are you an... AI skeptic thunder clashes.
      • unholiness2 hours ago
        Yeah, this made it basically clickbait for me, in terms of time I wasted with the wrong expectation.

        The lack of downvotes on posts on HN has always felt like more of a bug than a feature to me.

      • nomel3 hours ago
        So, the perspective of the one that gains the most, that will value this the most, and that will pay the most? ;)
    • 3 hours ago
      undefined
    • jimbokun2 hours ago
      Does it matter to the people requesting the software if it acts in the way they expect?
      • eithedan hour ago
        True, but you should say that about every thing. Does it matter to you how the car drives, as long as it takes you to your destination? Well, yes, it matters: how will it deal with a crash, and if it's possible to replace a part and if anybody can just open it if you leave it outside. I will be amazed if somebody shows me their home-printed car, but if they'll try to sell it to me like a new one...
    • adamtaylor_133 hours ago
      I'm becoming more convinced these are questions of the Before Times. Yes, yes—heresy, I know.

      Yet, I can't deny the reality that I observe working with LLMs every day. If this truly is a step-function (as some are sgguesting), then I have absolutely zero concern for the quality of the code.

    • grafporno3 hours ago
      It's an ad.
  • JumpCrisscross4 hours ago
    Anecdote: I fed Fable some models I’ve been hand verifying (basically, I sketch out a scenario for Opus to model, it builds it, I ask it to show me the math, I correct it, we iterate like this, then I double check its code to make sure the math matches the model logic). Fable found almost every error I found, and then had some interesting suggestions for additional variables.

    It also burned through my usage quota like a late-90s Hummer.

    • matheusmoreira2 hours ago
      > It also burned through my usage quota like a late-90s Hummer.

      Yeah. I have a Max 5x subscription and Fable burned through 16% of my weekly quota in a 40 minute code review session. It didn't even finish the review, it switched back to Opus 4.8 in the critical memory safety parts where I actually needed Fable.

      I feel like I'm going to get priced out of these models soon. I should probably try to get the most out of Fable until June 22nd.

    • cyanydeez4 hours ago
      now for the best question: whats your ROI here?
      • Ferret74463 hours ago
        Humans are very expensive, so the equation almost always falls against them.

        It's not just salary, but also safety/labor regulation, legal risk, vacations, sick time, personal conflicts, HR, benefits.

        Even when automation is more expensive on paper, it's generally still cheaper

        • rstuart41339 minutes ago
          [delayed]
        • gopher_space29 minutes ago
          One of the large (and enjoyable IMHO) challenges in this line of work is developing a de facto understanding of your process and the context it's in service to, and that's only possible if you're actually on your industry equivalent of a "shop floor" for each domain the project touches.

          As far as I can tell this part of the job isn't really on anyone's radar anymore.

        • TheOtherHobbesan hour ago
          Good to know that LLMs will be removing all regulatory and legal risks, as well as creating a consumer economy that no longer employs or pays consumers.

          I can't help thinking there might be some kind of strategic issue here.

          Perhaps someone should ask Mythos about it.

        • warkdarrior3 hours ago
          That's the beauty of these AI advancements. You, a human, will have to compete against a model for the same job.

          If you get $100,000 per year as a SWE, and Anthropic offers a coding model for $100,000 per year (but working 24/7), then you'll have to give up all of those addons that make the fully burdened cost of the employee. Say goodbye to vacation, sick time, benefits, etc.

      • Qhemlomo3 hours ago
        It just got released, it shouldn't matter.

        We know this model will be cheaper and faster with time.

        And we have not even reached the timespan/timeframe were we have ASIC style models.

        OpenAI has to do something which will beat Fable otherwise Anthropic won. China currently overtakes cars, pv, batteries and very soon silicon chip making, it has all the incentive to also take over AI.

        • camillomiller3 hours ago
          LOL magical thinking
          • Qhemlomo3 hours ago
            I'm happy to discuss arguments if you want to add any?
            • throw9394945552 hours ago
              Not OP, but for me, this model will get VERY expensive in 2 weeks. Now it is part of Pro plan, after 22nd it will get excluded and I will pay by token API usage (~10x more expensive).

              I find it good for code reviews.

            • Our_Benefactors3 hours ago
              The only thing they’ve overtaken is arguably batteries, and even that is questionable if the quality is as good as Korean manufacturers. I think it’s more likely that the Chinese chip industry overtaking competitors will remain like nuclear fusion, forever “just 5 years away”
      • 2 hours ago
        undefined
      • PunchyHamster4 hours ago
        It will be great when the price of compute/memory drops to normal level!
        • cyanydeez2 hours ago
          >Sam Altman has signed another Memoranda of Understanding: Buying all SDRAM till the heat death of the universe OR Musk relocates to mars.
  • ecocentrik3 hours ago
    Reading the first few paragraphs of what he calls "the most sophisticated academic social science paper I have yet seen from an AI" does not impress as much as I hoped.

    "Posterior beliefs about market demand are purely referencedependent: holding dollars raised constant, they track only performance relative to the founder’s self-chosen goal—jumping half a standard deviation at the threshold, responding steeply for the first ten points past it, and flattening thereafter"

    Humans generally don't verbalize data this way. The summary document is also very fluffy.

  • olafmol3 hours ago
    This little line from the article scares me: "but a software engineer would iron out the remaining potential bugs that I could not find quickly"

    Every sw dev knows this is a very dangerous, and unrealistic, assumption.

  • mohsen14 hours ago
    I have been using it for less than an hour so take this with a grain of salt of being excited for the new tech.

    In a project like mine (https://github.com/tsz-org/tsz) I am constantly frustrated that models were not doing enough research and were not taking into account other situations. Again and again models would produce code that would fix one thing and break 2 other tests that were "unrelated".

    With Fable it seems like tasks are taking much longer (I have not seen a pull request from Fable sessions yet) but reading the transcription of those sessions I can see how it is doing the right thing by not leaving any stone unturned.

    As the article says, it's hard to communicate this "feeling" about models because it is very project specific but I thought I share

    • anematode3 hours ago
      Does this not indicate that the project might not be structured in an appropriate way that allows incrementally adding features?
      • layer83 hours ago
        In general, sooner or later you need to restructure one thing or another when requirements are changing. Good code lets you reason about a refactoring, and experience tells you when it is necessary or appropriate. Coding agents aren’t very good at the latter.
      • mohsen13 hours ago
        the setup is solid. there are thousands of tests and CI won't let things to merge if tests are failing.

        But overall, this is pretty normal for compilers to have this sort of "unexpected" tests failing due to some work in an area. It happened to me when I was coding everything manually back in the day too

        • anematodean hour ago
          > there are thousands of tests and CI won't let things to merge if tests are failing.

          That's not what a clean setup means... I mean good separation of concerns, established invariants, etc.

    • nxmxksisksnssb4 hours ago
      [dead]
  • selfawareMammal4 hours ago
    What are people working on that they see such a substantial difference between Mythos and Opus? I'd say I'm working with advanced stuff and more than often Deepseek is even more than enough. Why is everybody a genius in here?
    • jenniferhooley3 hours ago
      Just depends what you are working on. If you are trying to make a video game that's at a level of a decent indie game (think Hades/Baazar/etc), making UI elements/VFX/complex shaders/etc that are organic/interactive/animated that don't feel like a little dogshit vibeslop web-game, then none of the models are even close to good enough to get it done easily. Huge percentage of problems in top 3% games is really hard for any of the models to do with simple prompting.

      Personally I don't really care, because I like coding and learning myself and DeepSeek Flash is all I really care about. But it's really easy to have a ton of benchmarks where the top models can't get anywhere close - and I like to test them on these problems to see how good they are getting.

      Fable 5 is def a little better than 4.8 btw.

    • jstummbilligan hour ago
      I am sure you would not find it hard to exhaust any model, if you kept upping your ask enough times.

      On the margins, suppose the prompt is literally: "Build a feature complete, high polish Facebook clone". Facebook is complex but likely not super complicated tech, and still I would assume that (after having burned through a substantial amount of tokens) you would find substantial enough differences in the outcomes between different models on that prompt on various fronts.

      The above ask is obviously not useful, but what's preventing you from taking on bigger chunks until you approach the limit? At some point you would hit a boundary, where the diff will be obvious.

    • ianm2184 hours ago
      I’ve been working on implementing some common web infra type projects in Rust lately. Basically trying to use a lot of the great primatives in Rust like rustls (modern openSSL) and Tokio (async) to build memory safe or close, nginx drop in replacements.

      A small portion of this effort is having a high quality Lua in Rust repo. I’m using mythos to fix some of the performance issues with my Lua interpreter that gpt 5.5/ opus 4.8 had stone walled on.

      Not sure if Mythos will be able to crack this but it has been running for a couple hours now with some promising results.

      Performance charts linked here if your curious https://github.com/ianm199/lua-rs

      • mplanchardan hour ago
        What’s wrong with mlua?
        • ianm21815 minutes ago
          Mlua works for many use cases but is a wrapper around the C code, so you need to bundle C as part of the build. So this is worse for cross compilation and makes it so you can't easily use mlua projects in wasm32-unkown-unknown. An example is that it would be hard to run a game in the browser that exposes Lua scripting with mlua.

          The other reason is that because mlua is just a wrapper around the C code, it has unsafe you can't really get around. So for example Lua is used in Redis, which has this critical CVE https://github.com/redis/redis/security/advisories/GHSA-4789... that a memory safe version of Lua wouldn't have to deal with.

          Mlua is still fine or even better for many other cases though!

    • mervz4 hours ago
      We see the same thing when new laptops are announced and every employee all of a sudden needs to upgrade, despite the fact that 90% of people would be able to make do with a Macbook Neo.
      • Our_Benefactors2 hours ago
        > despite the fact that 90% of people would be able to make do with a Macbook Neo.

        Myth. Total myth! I recently had to beg for more RAM after continually hitting swap space which causes tools like dictation to stop working, failure to load certain websites without rebooting, and so on. Devs do in fact need powerful machines and the ~$500-1000 an employer saves upfront in machine costs is dwarfed by productivity losses.

        Giving your engineering employees new machines in a 2-year cycle that are between the middle and high end is one of the cheapest ROI decisions that a tech org can make.

        • oarsinsync2 hours ago
          Surely devs could just uninstall Slack, and get the same combined RAM & productivity boost?
    • 4 hours ago
      undefined
    • mohsen14 hours ago
      I had a few of the benchmarks left alone and was working on tech debt knowing that a new model is going to be released soon. For my project (tsz.dev) Opus 4.8 was running in circles without producing results for a while for those tasks
  • gopalv5 hours ago
    > It worked for nine and a half hours.

    > Again, it wasn’t perfect. As an expert, I was able to spot some errors and omissions (some as a result of the design I had asked for) that I had the AI correct

    That's the bit that stuck out to me - that's longer than I would expect to work on a problem in a day or even expect to go back & fix the output of something that has a core reward loop of hours.

    My customers are currently clamoring to push down my agent response times from 85 seconds down to below the 20s mark.

    At the same time, it is very dissonant to see the industry heading towards hour+ long workflows with an agent.

    • matneyx5 hours ago
      In Claude's defense (and I cannot believe I'm defending it), I know no single dev who could create what it did (Concord), from a 19-page design document, in 9.5 working hours.

      We're gonna go back to the days where our bosses ask why we're just sitting around, but instead of saying "compiling," we'll just say, "waiting for Claude."

      • torginusan hour ago
        I tried to read the 'design doc' - its slop full of vague platitudes and impressive sounding but impossible to pin down management speak - in short, it's slop, and I still don't really get what its supposed to do exactly.

        It's some prompt engineered AI harness, that guides the AI to create stats after it researches a subject and ingests the data, but I'm not sure what is it that the tool actually does on top of this.

      • giancarlostoro4 hours ago
        This. I get told things like "you can't build all that on your own?" I've had Claude poop out full feature web apps in under 30 minutes, to a spec. Was it perfect? No, but sometimes even in a simple setup phase you can burn 15 minutes to some obscure setup step that's failing. I cannot just code nonstop at 900WPM or whatever ridiculous speed, and poop out an entire full feature web app, with maybe a few bugs here or there. If you can, come show me, I'll gladly have you race against my Claude prompting capabilities.

        Will Claude's code be perfect in one shot? Probably not, will it get you 80 to 90% of the way there with your chosen design patterns in under a few hours? Absolutely.

        • toss12 hours ago
          >>If you can, come show me, I'll gladly have you race against my Claude prompting capabilities.

          Sounds like we've nearly reached in coding the point where Paul Bunyan [0] has his epic competition with the chainsaw... and loses by 1/4" and history forever changes...

          [0]https://www.britannica.com/topic/Paul-Bunyan

      • neogodless5 hours ago
        For the rare uninitiated:

        https://xkcd.com/303/

      • petesergeant4 hours ago
        Sadly I didn't get very many answers to my Ask HN, "What are you doing during inference?": https://news.ycombinator.com/item?id=47944917
    • giancarlostoro4 hours ago
      > At the same time, it is very dissonant to see the industry heading towards hour+ long workflows with an agent.

      At this point, pay me significantly more, and I'll do it.

      • warkdarrior3 hours ago
        > pay me significantly more

        Ha ha, that's how you negotiate yourself out of a job!

        • giancarlostoro40 minutes ago
          Fire me then, I can bring someone else drastically more value with AI tooling.
    • PeterStuer5 hours ago
      My Opus 4.8 regularly works for 10+minutes on a single non-trivial coding request.
      • ASalazarMX4 hours ago
        Your Opus 4.8? Is it now usual to refer to LLMs like that?
        • wongarsu4 hours ago
          Isn't it common to refer to all software like that? "Let my look at my JIRA", "I can't find anything using my Outlook's search function", "My Powerpoint is acting up today", "My browser just crashed" are all sentences I might say during a normal work day
          • hypfer4 hours ago
            Depends on the demographic I think. And also tells you surprisingly much about how the brain of person uttering it works.

            There are people that almost feel physical pain if something is unnecessarily incorrect.

            + That if the mental model of something is accurate, it is actually _more_ work to say something that is incorrect than just saying the correct thing.

            • wongarsu4 hours ago
              In my mental model, "my Outlook" is the outlook instance running on my computer, on my data. My outlook crashed today. Yours might not have crashed. Similarly, my Jira contains tickets about my work, your Jira does not contain those same tickets. That might be technically the same instance on the same SaaS server, but the server I'm routed to accessing my data with my credentials turns it into "my Jira". My Jira is slow. Maybe you are lucky and get routed to a faster server, or your company is self-hosting. Then your Jira might be reasonably fast
              • hypfer4 hours ago
                Hmm, good point. "My outlook" might actually be correct. Depending on if it is a webapp or the real one running on your device that is.

                Similiar to "My game just crashed".

                Jira otoh is not yours, because it's in the cloud. It might be "my internet connection", "my browser" or "my account" that is having trouble.

                ___

                Hm. "My train got delayed" is interesting in this context. I don't find that offensive. But that also might be because trains don't seek rent the way SaaS does? Not sure.

                I guess trains do not hold me hostage. They might just be a container in which someone does that.

                Jira, cloud LLM inference or similar otoh..

                • ASalazarMX2 hours ago
                  The "my train" convention is an interesting argument. It's not actually yours, you're buying a train-as-a-service single-use license, and there are tiers to that too.

                  I guess the main difference is that TAAS has many different trains where the experience varies wildly, so it helps to be specific on which train you're licensing; but LLMs are the same product for everyone, and you can't stay with say, ChatGPT 1.0, you get the same choices as everyone else.

              • ASalazarMX3 hours ago
                This is completely fine, as those are your own installs, but LLMs can't be owned by the users, your Opus is the same Opus as everyone else's, your only difference is the suscription tier to their API.

                If you had your own on-premises LLM, that would indeed be your LLM, and it would make sense to compare it to the on-premises LLMs of other people, as your setup particulars would affect the result.

                • dasyatidprime3 hours ago
                  The copyright to the Outlook binary isn't owned by the users either, even if they're running it on local hardware. The Opus 4.8 weights are (we assume) the same between users, but the conversation/tooling state is not shared between them by default. I prefer to route around this construction myself, since I do think there's some ontological slippery-slope potential, but from a lexical perspective I think “my” is a perfectly defensible abbreviation in context.
                  • hypfer3 hours ago
                    > The copyright to the Outlook binary isn't owned by the users either, even if they're running it on local hardware

                    There was a time where one actually bought software to own it.

                    This time is.. actually it is right now. Please leave at once.

            • RugnirViking2 hours ago
              > tells you surprisingly much about how the brain of person uttering it works

              That's ridiculous. You wouldn't respond to "I went to visit my doctor yesterday" with "but slavery has been illegal since forever!" Similarly it would be foolish to respond to "where should we meet? my place or yours" with "but we both rent!"

          • calvinmorrison4 hours ago
            better than "The JIRA" , or "The Google" or "The Spotify"
        • w4yai4 hours ago
          You don't have your Opus 4.8 ? I got mine yesterday !
          • ASalazarMX2 hours ago
            I didn't get mine, but I suspect I might be using yours when I use it.
        • giancarlostoro4 hours ago
          That's pretty tame, if you want to be disturbed check out r/MyBoyfriendIsAI
          • throw9394945552 hours ago
            Or dog lovers. All sorts of licking, anal cleaning... full intimate relationship.
    • hedgehog5 hours ago
      Work duration is also not that valuable of a measure, you're usually better off defining the process yourself in code and having that delegate chunks of work to the models. The only real issue there is that it's harder to take advantage of the providers' subscription discounts, but on the other hand it's easier to do your own model routing, and there's no way I've seen for the normal chatbots to maintain coherence on streams of work measured in days and weeks.
    • cyanydeez4 hours ago
      I think we hit the sigmoid back when the QWEN models were released. By properly structuring my project, I can point it at any extension I want and get it going for 30 minutes to extend whatever. It can't effectively do 'god mode' on all the code, but being a mindful observer and code "professional" I don't need more than what a 128GB VRAM needs.

      I'm amazed we're so far into SOTA bloat that the chinese will kill once they start etching silicon with these models.

  • theturtletalks4 hours ago
    This is what he built:

    https://isochronic-passage-chart.netlify.app/

    Doesn’t work too well on mobile but looks interesting

    • skipants4 hours ago
      It looks interesting but, like a lot of AI, looks correct but is not. Most of northwestern Canada says you can get there by road. If you look at Google Maps, there's no roads there for quite awhile. I see one highway between Inuvik and Tuktoyaktuk but that's about it.
      • neom3 hours ago
        Reminds me of a fun story. Some 20 years ago when I moved from Fort Frances to Toronto for college, my high school best friend was also going to college in Toronto, and his dad offered to drive us together in his truck with all our stuff in the back. We were saying our goodbyes and my buddies dad said to my dad "We'll get there a lot faster, I found a shortcut!" My dad, confused says "shortcut? there is no shortcut, just highway 1..." and his dad insists he found an alternative route, much shorter by kms and we'll fly up there 6 hours faster! Get into the truck and he pulls out 5 pages of printed mapquest... I assure you, having done it, Sault Ste. Marie to Sudbury via Elliot Lake on logging roads, may look interesting, but not correct, added a good 8 hours to the trip.
    • jampa3 hours ago
      It is hallucinating many flights in my region, some that never existed (so it is not an outdated data problem).

      I also see some logic flaws. It overlooks the option of going to a major hub to access faster aircraft, rather than hopping on local hubs.

      Also, immigration and customs are cleared at the first airport you arrive at in the country, not at the last one.

      In some countries, you need to clear immigration even while going to a third country, so 1 hour is not enough to do it.

  • neaden3 hours ago
    Man, that poem it made is terrible. Like just incredibly bad. Sure it's neat that software can make an incredibly bad poem but there is enough bad poetry in the world that we don't need it.
    • Kiro2 hours ago
      How good can a rhyming poem about a haircut where every word starts with S be?
    • layer83 hours ago
      I wonder what Vogons would think of it.
  • thepasch4 hours ago
    What it feels like to work with Fable:

    > Switched to Opus 4.8: Fable 5 has safety measures that flag messages on most cybersecurity or biology topics. They may flag safe, normal content as well. These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them. Send feedback or learn more.

    • matheusmoreira2 hours ago
      Same experience here. The parts of my project that actually could have benefited from Fable's code review got this instead.
  • ElijahLynnan hour ago
    Loved the article!

    And I'm excited to try it, but also have a fear that I will like it too much and then won't have access to it in 2 weeks... But maybe I will and maybe it will be worth it and I'll just pay a bunch of extra for it and it'll be great!

    I think the article could be improved by actually sharing more feelings. I clicked on the article for feelings but I didn't see that many feelings described.

  • pu_pe2 hours ago
    The isochrone maps are quite beautiful [1], and go beyond the scope and refinement of some earlier human attempts I could find [2][3][4].

    [1] https://isochronic-passage-chart.netlify.app/

    [2] https://mapitout.welcome-to-nl.nl/

    [3] https://commutetimemap.com/

    [4] https://andrewding.ca/flightisochrones/

  • recursivedoubts5 hours ago
    would it be possible for mythos to make the space bar scroll the pages on your website properly?
    • mulr00ney4 hours ago
      Seems to be hijacked the video of some game they generated. :(
      • albedoa2 hours ago
        If you delete the video from the DOM, then click back into the content area, it reattaches the video lol.
  • asdK1206 hours ago
    Mollick runs the Generative AI Lab at Wharton, with all the corporate sponsors.

    He is a professor but sadly also an AI shill. He should switch to advertising washing power.

    • MostlyStable6 hours ago
      So...no engagement with the substance? Not even to explain why it is that this is not a useful description or test of capabilities? Ok.
      • dthread35 hours ago
        I would like to see it do something useful, like converting pytorch to golang.
        • Philpax2 hours ago
          Will you accept a port of Torch to Rust? https://github.com/forecast-bio/ferrotorch
        • cadamsdotcom5 hours ago
          Why not get a plan from Anthropic and get that done yourself? Probably is going to cost you as much as a coffee.
        • lijok5 hours ago
          Hot damn - is that the floor of what you consider useful?
        • fdsdfsdfzxczxc5 hours ago
          This newfangled car thing is useless. It can't even properly shoe a horse.
    • whyenot5 hours ago
      Instead of attacking the author, please respond to the content of the article. That is the HN way, and it leads to more substantive and interesting discussions.
    • CuriouslyC4 hours ago
      Ethan is a booster but I wouldn't call him a shill. He cites data and mostly in a fair way, though you could argue the sources he chooses to focus on are biased.
  • wxw3 hours ago
    I am… underwhelmed by the artifacts in the post.

    I don’t see why working longer is a pro. The results don’t seem much better than you’d get from putting Opus in a long loop.

    • warkdarrior3 hours ago
      > The results don’t seem much better than you’d get from putting Opus in a long loop.

      Care to share the results you got from Opus working on the same prompt? It should be easy to compare quality.

  • mjamesaustin3 hours ago
    The snake game is legit very fun. Once I got the ability to pick up the apples and plant apple trees, I was sold.
  • Aperocky4 hours ago
    > This is a map that shows the distance you can travel in a given length of time, and the first one was created in 1881 showing travel times from London.

    The first item on the article, the first thing it showed, was wrong though.

    It is 100% faster to go from London to New York in 1881 than Volgagrad. Or any of the Russian hinterland colored green or Turkey or Egypt.

    • patcon3 hours ago
      > faster to go from London to New York in 1881 than Volgagrad

      the map is for 2026, yeah?

  • ComplexSystemsan hour ago
    Who can afford to use this damn thing though? They're pricing everyone out of the market with stuff like this.
  • ElijahLynnan hour ago
    > The work has shifted from process to outcome. I no longer steer; I commission.
  • mawadev3 hours ago
    Isn't it weird that we started to gauge the quality of a model by checking the vibe of the vibe coding?
  • vb-84483 hours ago
    Nice, but I'm really curious about how many tokens have been used.

    There is only one hint: 475k tokens in the screenshot when OP asked the model to fix some behaviour, but it would be fascinating to know the total tokens amount.

  • steve19773 hours ago
    > it is indicative of AI solving a hard problem involving research, math, visual development, taste, judgement, complex coding, and more.

    Is it a hard problem or is it just labor intensive?

    • warkdarrior3 hours ago
      Depends on the skill of the person working on it.
  • 382hi6 hours ago
    I think Qwen 3.7-Plus is better at reasoning than Mythos, and I've used both for quite a while.
    • giancarlostoro4 hours ago
      Would love to see samples of the kinds of prompts you use with both. I sometimes wonder if the specific wording is the secret sauce, I have very few issues with Opus / Claude, but when I try premier GPT models, I get weird output from what I've grown to expect with Claude.
  • PaulHoule3 hours ago
    My wife likes to say "feelings aren't facts"
  • LogicFailsMe3 hours ago
    I'm using Fable this afternoon and it's definitely a step up from Opus 4.8, finding and fixing things Opus 4.8 was blind to even perceiving. The next 13 days are going to be fun IMO. And Opus 4.8 was less annoying than Opus 4.7 FWIW.

    Edit: A couple hours in and I just got my first gaslighting attempt from the model. Good times!

  • root_axis6 hours ago
    I just can't stand this type of fawning language.
  • catigula3 hours ago
    >Ethan Mollick

    Just an FYI this guy is an AI hype-beast. Some of his tweets are truly out there.

  • zb34 hours ago
    Was the condition of being granted early access to this castrated model writing a post praising it?
  • zuzululu4 hours ago
    > First, how good is Fable? In experiment after experiment I conducted, it outperformed basically every other public model I have used by a considerable margin.

    What makes me excited is that GPT 5.6 (its actually GPT 6) is going to be crazy

  • ThejaCH3 hours ago
    What it feels like to work with Mythos? Feels like am poor
  • younglunaman3 hours ago
    >What it feels like to work with Mythos >Looks Inside >So I did this with fable...

    What?

    • warkdarrior3 hours ago
      Fable is Mythos with extra guardrails, so the analysis holds.
  • the_doctah5 hours ago
    More Mythos Marketing.
    • boringg4 hours ago
      The mythos of Mythos is marketing.
    • 3 hours ago
      undefined
  • honeycrispy3 hours ago
    Reading it, I can't help but feel he's being paid to write this. Or maybe he hopes to be paid. The language he uses makes him sound like he's fawning over the lost days of his childhood. Pardon me for being skeptical, but a trillion dollar company running a net-loss is hoping to IPO, and needs to sway public opinion by any means necessary. I would imagine that no dirty marketing scheme is off of the table, even from the self-proclaimed "good guys".
  • et-al6 hours ago
    [flagged]
    • astrange5 hours ago
      It is not a sponsored article and he writes one of these every time a new model releases. Why would a professor at Wharton need to write sponsored Substack articles.
    • 0x1ceb00da5 hours ago
      "I don't care who the IRS sends I am not paying taxes!"