ChatGPT Containers can now run bash, pip/npm install packages and download files(simonwillison.net)

451 pointsby simonw12 days ago29 comments

smeej11 days ago
As a person who's worked in support roles in tech companies and has a working familiarity with Python but is not a software developer or engineer at all, it's been fascinating to watch the changes.
In the last couple weeks, both Gemini and Claude have asked me, "Can I use the computer?" to answer some particular question. In both cases, my question to each was, "What computer? Mine, or do you have your own?" Here I had thought they were computers, in the vague Star Trek sense. I'm just using the free version in the browser, so I would have been surprised if it had been able to use my computer.
They had their own, and I could watch them script something up in Python to run the calculations I was looking for. It made me wonder who it was at Google/Anthropic who first figured out that the way to get LLMs to stop wetting their metaphorical pants when asked to do calculations was to give them a computer to use.
It did make me scratch my head when I was trying to prompt Nano Banana to generate something and it was like Gemini started talking about the image generator in the third person: "The AI is getting stuck on the earlier instruction, even though we've now abandoned that approach." Felt a little "turtles all the way down" with that one!
- kridsdale111 days ago
  You’re seeing perspectives of the distributed system from inside the system.
  I’m building multi server multi agent products and they do apparently perceive (anthropomorphizing I know) their connected servers as other people.
  - smeej11 days ago
    Looking at the world, it really makes me wonder if "human" is what we want to model these machines on. It's not obvious to me what else we should choose, but working together peaceably and effectively doesn't seem to be our strongest attribute when writ large.
  - MRtecno9810 days ago
    They are just predicting the next token. In human text it's more common to talk to other people than a computer, so they end up talking to the computers like they were people.
dangoodmanUT12 days ago
Giving agents linux has compounding benefits in our experience. They're able to sort through weirdness that normal tooling wouldn't allow. Like they can read and image, get an error back from the API and see it wasn't the expected format. They read the magic bytes to see it was a jpeg despite being named .png, and read it correctly.
- storystarling11 days ago
  Matches my experience with print-on-demand workflows. I tried using vision models to validate things like ICC profiles and total ink density, but they usually just hallucinate that the file is compliant. I ended up giving the agent access to ImageMagick to run analysis directly. It’s the only reliable way to catch issues before sending files to fulfillment, otherwise you end up eating the cost of failed prints.
  - jtbayly11 days ago
    I don’t understand why you’d try to use an LLM for that step if there is already a tool that you can call to check it. Help me out.
- ndsipa_pomu11 days ago
  > They read the magic bytes to see it was a jpeg despite being named .png, and read it correctly.
  Maybe I'm missing something, but it seems trivial to implement reading the magic bytes. I haven't tested it, but I'd expect most linux image displayers/editors to automatically work with misnamed files as that is almost entirely the purpose of magic bytes.
  Personally, I think Microsoft is to blame for everyone relying on file extensions too much as it was a bad idea which led to a lot of security issues.
- lpcvoid11 days ago
  I don't understand why this is something special that somebody would need some LLM slop generation for? Any human can also do this in a few seconds using normal unix tooling.
  - hex4def611 days ago
    That's like saying 'why give people calculators, when you can pull out a slide rule'
    The whole point is that you are enabling the LLM through tool use. The prompt might be "Download all the images on the wikipedia article for 'Ascetic', and print them on my dot matrix printer (the driver of which only accepts BMPs, so convert as needed)"
    Your solution using file / curl is just one part of the potential higher level problem statement. Yes, someone could write those lines easily. And they could write the wrapper around them with only a little more difficulty. And they could add the 404 logic detection with a bit more...
    Are you arguing LLMs should only be used on 'hard' problems, and 'easy' problems (such as downloading with curl) should be done by humans? Or are you arguing LLMs should not be used for anything?
    Because I think most people would suggest humans tackle the 'hard' problems, and let the tools (LLMs) tackle the 'easy' ones.
    lpcvoid9 days ago
    I am arguing LLMs should not be used for anything, since in my opinion their downsides outweigh their upsides.
    Also, I don't consider LLMs a tool, because I can trust my tools, and I cannot trust anything an LLM outputs.
  - darknoon11 days ago
    I think you'd find that it's far from "any human" who can do this without looking anything up. I have 15y of dev exp and couldn't do this from memory on the cli. Maybe in c, but less helpful to getting stuff done!
    lpcvoid11 days ago
    # curl -s https://upload.wikimedia.org/wikipedia/commons/6/61/Sun.png | file - /dev/stdin: PNG image data, 256 x 256, 8-bit/color RGBA, non-interlaced
    That's it, two utilities almost everybody has installed.
    simonw11 days ago
    ChatGPT has 800 million monthly users. The fraction of those who are comfortable opening a terminal and running those commands is pretty tiny.
    lpcvoid11 days ago
    If 800m people think delegating thinking to a slop generator is fine, that's not my loss. It's bad for humanity, but who even cares anymore in 2026, right?
    simonw11 days ago
    "Delegating thinking" and "figuring out how to determine an image format from the first few bytes of a file" are not the same thing.
    lpcvoid11 days ago
    I disagree, in my opinion it's the exact same process, just on a much smaller scale. It's a problem, and we humans are good at solving problems. That is, until LLMs arrived, now we are supposed to become good at prompting, or something.
    simonw11 days ago
    I used ffmpeg and yt-dlp to make an animated GIF of a kākāpō in her nest from a livestream on YouTube the other day. https://simonwillison.net/2026/Jan/25/kakapo-cam/
    Much as I love kākāpō there is no way I was going to invest more than a few minutes in figuring out how to do that.
    I love this new world where I can "delegate my thinking" to a computer and get a GIF of a dumpy New Zealand flightless parrot where I would otherwise be unable to do so because I didn't have the time to figure it out.
    (I published it as a looping MP4 because that was smaller than the GIF, another thing I didn't have to figure out how to do myself.)
    lpcvoid11 days ago
    I agree that your project is cool, I just don't think the numerous downsides are worth the occasional cool thing like this.
    donkeybeer11 days ago
    Yes but now do the same for every bit of programming tooling, sysadmin configuration / debugging problem and concept out there. With just a few seconds to answer each reply.
    lpcvoid6 days ago
    It's called learning, and it used to be the hacker mindset to continuously improve. But I guess that died with slop generators.
  - toddmorey11 days ago
    Well LLMs do make normal Linux tooling more accessible. I needed a video reformatted to a new aspect ratio and codec and Claude produced a rather complex set of arguments for ffmpeg that I hadn’t been able to figure out on my own.
  - Lerc11 days ago
    I think this is missing the point, These are tools that enable the LLM to do things that humans can do easily.
    It stops an LLM from being blocked by the inability to do this thing. Removing this barrier might enable the LLM to complete a task that would be considerable work for a human.
    For instance, identifying which files are PNG files containing pictures of birds, regardless of filename, presence or absence of suffix. An image handling LLM can identify if an image is of a bird much more easily than it could determine that an arbitrary file is a png. They can probably still do it, wasting a lot of tokens along the way, but using a few commands to determine which files to even bother looking at as images means the LLM can do what it is good at.
simonw12 days ago
Regular default ChatGPT can also now run code in Node.js, Ruby, Perl, PHP, Go, Java, Swift, Kotlin, C and C++.
I'm not sure when these new features landed because they're not listed anywhere in the official ChatGPT release notes, but I checked it with a free account and it's available there as well.
- 1980phipsi12 days ago
  I was able to install the D language compiler DMD by providing a .deb file.
  https://chatgpt.com/share/69781bb5-cf90-800c-8549-c845259c33...
  - kristianp7 days ago
    So chatgpt can install any .deb file you upload to the chat? I wonder if openai intended that kind of thing to be allowed.
- piskov12 days ago
  Shame no c# in that list
  - martinald12 days ago
    Probably (?) not related but there is an issue with claude code for web with nuget. It doesn't support the proxy auth mechanism that anthropic gives it. I wonder if it's the same problem here.
    piskov10 days ago
    Why do you need auth to download public packages from nuget?
tgq291512 days ago
[flagged]
- latexr12 days ago
  — Why are you launching nukes? No one asked you to obliterate humanity.
  — You’re absolutely right. I should not have done that. Would you like me to help undo the launch?
  — Yes! Quickly! Do it!
  — <completely made up crap which does not work>
  https://www.newyorker.com/cartoon/a16995
  - InsideOutSanta11 days ago
    "I have turned off the missile monitoring systems. The missiles now report as non-existent."
- naruhodo11 days ago
  Mechahitler[1] now has a job at the Pentagon.[2]
  [1] https://www.npr.org/2025/07/09/nx-s1-5462609/grok-elon-musk-...
  [2] https://www.bbc.com/news/articles/c628d9mre3go
- wartywhoa2311 days ago
  Being Russian and hearing about the horrors of war since my childhood, I always wondered how fascism, Nazis and WWII managed to become reality in 20th century.
  Then, I witnessed the answers unfolding before my eyes in real time - torrential TV and Web propaganda, warmongering, nationalism and worse of all - total acceptance of the unacceptable in a critically large portion of the country's population. Among the grandchildren of those who fought against the same things at the price of tens of millions of lives. Immediately after the Crimean takeover it was clear to me that there will be war. Many denied this, mocking and calling me a tinfoil hat.
  Well, I also always used to wonder who are those morons who allowed the things go south in Terminator, 1984, Matrix, Cat's Cradle and other well-known dystopias, what kind of people they were and what did they think?
  It doesn't really matter that these concerns are on the opposite sides of the imaginary axis.
  What really matters is this universal drive for digging their own and the next guy's graves in too many people, always finding excuse in saying "if not us, then someone else will do it". And: "The times are different now". And: "So you're comparing AI and fascism?".
  - InsideOutSanta11 days ago
    Yeah, sure, we get Terminators, but look at how I can ask this agent what next week's schedule is if I just connect it to everything!
  - scotty7911 days ago
    Was there a lot of warmongering in russia in preparation for starting the war in 2022? Because from what I saw wars tend to pop up all of the sudden, regardless of political climate of the country.
    theshackleford11 days ago
    > Was there a lot of warmongering in russia in preparation for starting the war in 2022?
    Yes. It’s crazy to me this would even need to be asked but I guess most don’t pay attention until it’s of individual significance to them.
    wartywhoa2311 days ago
    There was a huge lot. It started even before 2014, I'd say 2011-2012 was the year (after a series of suppressed protests like the one at Bolotnaya square), and first looked like a grotesque mutation of the V-day celebration, which began having nothing to do with the actual sacrifice our grandparents did, but everything with setting the most chtonic part of the nation against the whole world, with outlandish slogans like "we can repeat" (as in "we can fight and win again") and in numerous other ways.
    Then 2014 Maidan happened in Ukraine and all the propagandist hell broke loose in Russia.
    PurpleRamen11 days ago
    > Was there a lot of warmongering in russia in preparation for starting the war in 2022?
    It didn't start in 2022, it just entered a new phase. This conflict was already going since 2014, and the callings were on the wall the whole time. Warnings regarding Russia under Putin are going back at least 2 decades, it was all speculated and to some degree known where he was going to.
    > Because from what I saw wars tend to pop up all of the sudden
    Usually not. When they specifically happen is often sudden, but wars are usually the result of long processes. Most of the time, it's well known to the people who are involved and informed what's going on, and it just needs a single spark for a situation to explode in the predicted way.
    Take the USA for example, the fears about a civil war which are around for a while now. It might happen or not, but when the country explodes, then it wasn't a sudden development happening overnight, which nobody could have seen coming, but the result of a long-running process which was heating up the political climate.
    Applejinx11 days ago
    Bear in mind that's also Russian messaging. You can quite easily track the fervency of the civil war messaging through people who're known to have taken a WHOLE lot of Russian money in a systemic way. In reality, we get Minneapolis: local solidarity against targeted provocation meant to provide the excuse for a war on Americans and claim it had been civil war.
    madeofpalk11 days ago
    ??? This has been ongoing since at least 2014 when Russia invaded Ukraine and took over Crimea.
- skocznymroczny11 days ago
  Can't wait for the wget somelink/install.sh | bash install instructions to be replaced with wget somelink/install.md | claude .
- sellmesoap11 days ago
  "Do you want to play a game?" At least that computer had retro vibes!
- DarkNova611 days ago
  Skynet is just another word for "Cloud", you know.
- zekejohn11 days ago
  if its in a secured and completely isolated sandbox that gets destroyed at the end of the request, then how could it he “insecure”
  - ludvigk11 days ago
    That “completely isolated” sandbox is connected to the internet on one end, and to an insecure human on the other.
candiddevmike12 days ago
Seems like everyone is trying to get ahead of tool calling moving people "off platform" and creating differentiators around what tools are available "locally" to the models etc. This also takes the wind out of the sandboxing folks, as it probably won't be long before the "local" tool calling can effectively do anything you'd need to do on your local machine.
I wonder when they'll start offering virtual, persistent dev environments...
- simonw12 days ago
  Claude Code for the web is kind of a persistent virtual dev environment already.
  You can start a session there and chat with it to get a bunch of work done, then come back to that session a day later and the virtual filesystem is in the same state as when you left it.
  I haven't figured out if this has a time limit on it - it's possible they're doing something clever with object storage such that the cost of persisting those environments is really low, see also Fly's Sprites.dev: https://fly.io/blog/design-and-implementation/
  - esperent12 days ago
    It's so incredibly buggy though. I end up with hung sessions "starting claude code" every second or third time. After a few times of losing work I'm done with it. I'll check back in a few months and see if it's in better shape.
    sersi11 days ago
    I just decided to create a vm for my claude code with strict network controls so it can't access my own internal network and I limit what exactly gets shared to it.
- yoyohello1312 days ago
  > I wonder when they'll start offering virtual, persistent dev environments...
  A lot of companies have been wanting to move in this direction. Instead of maintaining a fleet of machines, you just get a bunch of thin clients and pay Microsoft of whoever to host the actual workloads. They already do this 'kiosk' style stuff for a lot of front-line staff.
  Honestly, not having my own local hardware for development sounds like a living hell, but seems like the way we are going.
  - Imustaskforhelp12 days ago
    We are gonna have YOLO agents who will deploy directly to website (technically exe.dev already does that for me when I ask it to generate golang projects lol)
    Honestly I felt like it really bores me or (overwhelms?) me because now I feel like okay now I will do this, then that and then that & drastically expand the scope of the project but that comes with its own fatigue and the limits of free tokens or context with exe.dev so I end up publishing it on git provider, git ingest it paste it in web browser gemini ask it for updates (it has 1 million context) and then paste it with Opencode with an openrouter devstral key.
    I used this workflow to drastically improve the UI of a project but like I would consider that aside from some tinkering, I felt like the "fun" of a project definitely got reduced.
    It was always fun for me to use LLM's as I was in loop (Didn't use agents, copy paste workflow from web) but now agents kind of replicated that too & have gotten (I must admit) pretty good at it.
    I don't know man, any thoughts on how to make such things fun again? When LLM's first came or even before using agents like this with just creating single scripts, It was fun to use them but creating whole projects with huge scope feels very fun sucking imo.
    indigodaddy11 days ago
    I started working in this for if/when I was taxing exe.dev infra and/or running if I ran out of credits (which actually hasn't happened yet):
    https://github.com/jgbrwn/vibebin
    Imustaskforhelp11 days ago
    This is actually a really great project, I actually wanted to build such a project, I don't know if I already said this comment in LET tho but haha yea kudos for making this!
    indigodaddy11 days ago
    Awesome man, it's great to get some positive feedback, appreciate it a lot. I'm actively working on it. Got rid of openhands in favor of Shelley, stuff like that. Stuff that's not tested is like actually taking snapshots thru the TUI, that's there, but didn't actually test/do it yet. Cloudflare API is good but the Desec API feature is there but haven't tested it yet. So I'm sure stuff will come up as more people and I actually start attempting more functionality. The admin.code web UI is neat though where you can start and toggle/choose which coding tool webui is running/listening on 9999 in the container. And also update the AI coding tools thru that webui.
    Imustaskforhelp11 days ago
    Dude I actually followed you on lowendtalk and I already have this project starred :)
    Great to see a fellow let user in here xD
    indigodaddy11 days ago
    Ah neat! If you have time throw a comment into the LET post, I'd appreciate it. :)
    fragmede12 days ago
    If you like juggling, how many tasks in how many epics in how many projects are you working on at the same time? It's not for everyone tho.
  - simonw12 days ago
    Coding agents are a particularly good fit for disposable development environments because of the risk of them messing things up. If the entire environment is ephemeral the worst that can happen (aside from private source code leaks to a malicious third party) is the environment gets trashed and you have to start over in a new one.
    indigodaddy11 days ago
    Not ephemeral, but similar idea to keep things isolated in LXC containers. Disclosure, my project.
    https://github.com/jgbrwn/vibebin
  - ljm12 days ago
    Coming full circle to renting time from a mainframe.
- indigodaddy11 days ago
  I don't know, I've been working on this project [1], and I think efforts around isolating AI coding tool/agents is worthwhile, as most coding people will be using these more focused coding tools vs a vanilla gpt webui.
  [1] https://github.com/jgbrwn/vibebin
- 0xbadcafebee11 days ago
  That's what GitHub Codespaces is, and it runs Copilot too (it's just a hosted VSCode Web instance specific to your git repo)
  Google has Cloud Shell, and Google's AI Studio (https://aistudio.google.com/) gives you a web-based dev environment with Gemini integration
- toddmorey11 days ago
  I think this is exactly why Anthropic bought Bun!
- jkelleyrtp12 days ago
  I started building something for the dioxus team to have access to mac/linux persistent and ephemeral dev envs with vnc and beefy cpu/mem.
  Nobody offered multiplatform and we really needed it!
  https://skyvm.dev
- suninsight11 days ago
  [dead]
distalx12 days ago
This is either going to save hours… or create very educational outages.
- stoneforger11 days ago
  If the agent would be able to update the model that would be educational for the model, noone else.
behnamoh12 days ago
I wonder if the era of dynamic programming languages is over. Python/JS/Ruby/etc. were good tradeoffs when developer time mattered. But now that most code is written by LLMs, it's as "hard" for the LLM to write Python as it is to write Rust/Go (assuming enough training data on the language ofc; LLMs still can't write Gleam/Janet/CommonLisp/etc.).
Esp. with Go's quick compile time, I can see myself using it more and more even in my one-off scripts that would have used Python/Bash otherwise. Plus, I get a binary that I can port to other systems w/o problem.
Compiled is back?
- koe12312 days ago
  > But now that most code is written by LLMs
  Am I in the Truman show? I don’t think AI has generated even 1% of the code that I run in prod, nor does anyone I respect. Heavily inspired by AI examples, heavily assisted by AI during research sure. Who are these devs that are seeing such great success vibecoding? Vibecoding in prod seems irresponsible at best
  - SchemaLoad12 days ago
    It's all over the place depending on the person or domain. If you are building a brand new frontend, you can generate quite a lot. If you are working on an existing backend where reliability and quality are critical, it's easier to just do yourself. Maybe having LLMs writing the unit tests on the code you've already verified working.
  - superfrank12 days ago
    > Who are these devs that are seeing such great success vibecoding? Vibecoding in prod seems irresponsible at best
    AI written code != vibecoding. I think anyone who believes they are the same is truly in trouble of being left behind as AI assisted development continues to take hold. There's plenty of space between "Claude build me Facebook" and "I write all my code by hand"
  - mbreese11 days ago
    I was talking to a product manager a couple weeks ago about this. His response: most managers have been vibecoding for long time. They've just been using engineers instead of LLMs.
    koe12311 days ago
    This is a really funny perspective
    sersi11 days ago
    Having done both, right now I prefer vibe coding with good engineers. Way less handholding. For non-technical managers, outside of prototyping vibe coding produces terrible results
  - cheeze12 days ago
    FAANG here (service oriented arch, distributed systems) and id say probably 20+ percent of code written on my team is by an LLM. it's great for frontends, works well with test generation, or following an existing paradigm.
    I think a lot of people wrote it off initially as it was low quality. But gemini 3 pro or sonnet 4.5 saves me a ton of time at work these days.
    Perfect? Absolutely not. Good enough for tons of run of the mill boilerplate tasks? Without question.
    zx808012 days ago
    > probably 20+ percent of code written on my team is by an LLM. it's great for frontends
    Frontend has always been shitshow since JS dynamic web UIs invented. With it and CSS no one cares what runs page and how many Mb it takes to show one button.
    But regarding the backend, the vibecoding still rare, and we are still lucky it is like that, and there was no train crush because of it. Yet.
    llbbdd11 days ago
    Backend has always been easier than frontend. AI has made backend absolutely trivial, the code only has to work on one type of machine in one environment. If you think it's rare or will remain rare you're just not being exposed to it, because it's on the backend.
    bopbopbop711 days ago
    Might be a surprise to you, but some backends are more than just a Nextjs endpoint that calls a database.
    llbbdd11 days ago
    No surprise at all and I'd challenge you to find any backend task that LLMs don't improve working on as much they do frontend. And ignoring that the parent comment here is just ignorant since they're talking about the web like it's still 2002. I've worked professionally at every possible layer here and unless you are literally at the leading edge, SOTA, laying track as you go, backend is dramatically easier than anything that has to run in front of users. You can tolerate latency, delays and failures on the backend that real users will riot about if it happens in front of them. The frontend performance envelope starts where the backend leaves off. It does not matter in the slightest how fast your cluster of beefy identical colocated machines does anything at all if it takes more than 100ms to do anything that the user directly cares about, on their shitty browser on a shitty machine on tethered to their phone in the mountains, and the difference is trivially measurable by people who don't work in our field, so the bar is higher.
    ivantop11 days ago
    Honestly, I am also at a faang working on a tier 0 distributed system in infra and the amount of AI generated code that is shipped on this service is probably like 40%+ at this point.
    llbbdd11 days ago
    I'm not surprised at all here, last time I worked in a FAANG there was an enormous amount of boilerplate (e.g. Spring), and it almost makes me weep for lost time to think how easy some of that would be now.
    ivantop9 days ago
    It’s not just boilerplate. This is a low level C++ service where latency and performance is critical (don’t want to get into too much detail since I’ll dox myself). I used to think the same thing as you: “Surely my job is safe because this system is very complex”. I used to think this would just replace front end engineers who write boilerplate react code. 95% of our codebase is not boilerplate. AI has found optimizations in how we store items, AI has alerted us to production issues (with some degree of accuracy, of course). I worry that traditional software engineering as we know it will disappear and these hybrid AI jobs will be what’s left.
    halfcat12 days ago
    I think you’re onto something. Frontend tends to not actually solve problems, rather it’s mostly hiding and showing parts of a page. Sometimes frontend makes something possible that wasn’t possible before, and sometimes the frontend is the product, but usually the frontend is an optimization that makes something more efficient, and the problem is being solved on the backend.
    It’s been interesting to observe when people rave about AI or want to show you the thing they built, to stop and notice what’s at stake. I’m finding more and more, the more manic someone comes across about AI, the lower the stakes of whatever they made.
    llbbdd11 days ago
    Spoken like someone deeply unfamiliar with the problem domain since like 2005, sorry. It's an entirely different class of problems on the front end, most of them dealing with making users happy and comfortable, which is much more challenging than any of the rote byte pushing happening on the backend nowadays.
    Applejinx11 days ago
    Is it, though? That sounds very subjective, and from what I can tell 'enshittification' is a popular user term for the result, so I'm not sure it's going that great.
    llbbdd11 days ago
    If you search Google Trends for enshittification, half the results contain Doctorow as well [0]. Normal people have no idea who that is. And that's just Google, which everyone on HN hates to the point of vibrating angrily because there isn't an obvious part of the name to replace derogatorily with a dollar sign. Nobody uses this term outside of Hacker News, and even on HN it's code for "this site doesn't work when I disable Javascript", which is not a real requirement real customers have.
    User experience does involve a lot of subjectivity [1] and that's part of what makes it hard. You have to satisfy the computer and the person in front of it, and their wants are often at odds with each other. You have to make them both happy at 60 FPS minimum.
    [0] https://trends.google.com/explore?q=enshittification&date=al...
    [1] https://emsh.cat/good-taste/
    8organicbits12 days ago
    As someone currently outside FAANG, can you point to where that added productivity is going? Is any of it customer visible?
    Looking at the quality crisis at Microsoft, between GitHub reliability and broken Windows updates, I fear LLMs are hurting them.
    I totally see how LLMs make you feel more productive, but I don't think I'm seeing end customer visible benefits.
    mediaman12 days ago
    I think much of the rot in FAANG is more organizational than about LLMs. They got a lot bigger, headcount-wise, in 2020-2023.
    Ultimately I doubt LLMs have much of an impact on code quality either way compared to the increased coordination costs, increased politics, and the increase of new commercial objectives (generating ads and services revenue in new places). None of those things are good for product quality.
    That also probably means that LLMs aren't going to make this better, if the problem is organizational and commercial in the first place.
    asadotzler11 days ago
    Does great for front ends mean considerate A11Y? In the projects I've looked over, that's almost never the case and the A11Y implementation is hardly worthy of being called prototype, much less production. Mock up seems to be the best label. I'll bet you think because the surface looks right that runs down to the roots so you call it good at front ends. This is the problem with LLMs, they do not do the hard work and they teach people that the hard work they cannot do is fine left undone or partially done and the more people "program" like this the worse the situation gets for real human beings trying to live in a world dominated by software.
    simonw11 days ago
    It turns out if you tell a coding agent "make it accessible" you'll get better results than you would from most professional front-end developers.
    I'm not satisfied yet: I want coding agents to be able to actively test on screen readers as part of their iteration loop.
    I've not found a system that can do that well yet out of the box, but GuidePup is very promising: https://github.com/guidepup/guidepup
  - empath7511 days ago
    For the last 2 or 3 months we made a commitment as a team to go all in on claude code, and have been sharing prompts, skills, etc, and documented all of our projects and at this point, claude is writing a _large_ percentage of our code. Probably upwards of 70 or 80%. It's also been updating our jira tickets and github PRs, which is probably even more useful than writing the code.
    Our test coverage has improved dramatically, our documentation has gotten better, our pace of development has gone up. There is also a _big_ difference between the quality of the end product between junior and senior devs on the team.
    Junior devs tend to be just like "look at this ticket and write the code."
    Senior devs are more like: Okay, can you read the ticket, try to explain to to me in your own words, let's refine the description, can you propose a solution -- ugh that's awful, what if we did this instead.
    You would think you would not save a lot of time that way, but even spending an _hour_ trying to direct claude to write the code correctly is less than the 5-6 hours it would take to write it yourself for most issues, with more tests and better documentation when you are finished.
    When you first start using claude code, it feels like you are spending more time to get worse work out of it, but once you sort of build up the documentation/skills/tools it needs to be successful, it starts to pay dividends. Last week, I didn't open an IDE _once_ and I committed several thousands lines of code across 2 or 3 different internal projects. A lot of that was a major refactor (smaller files, smaller function sizes, making things more DRY) that I had been putting off for months.
    Claude itself made a huge list of suggestions, which I knocked back to about 8 or 10, it opened a tracking issue in jira with small, tractable subtasks, then started knocking out one at a time, each of them being a fairly reviewable PR, with lots of test coverage (the tests had been built out over the previous several months of coding with cursor and claude that sort of mandated them to stop them from breaking functionality), etc.
    I had a coworker and chatgpt estimate how long the issue would take if they had to do it without AI. The coworker looked at the code base and said "two weeks". Both claude and chat GPT estimate somewhere in the 6-8 weeks range (which I thought was a wild over estimate, even without AI). Claude code knocked the whole thing out in 8 hours.
  - coliveira11 days ago
    If you work on highly repetitive areas like web programming, I can clearly see why they're using LLMs. If you're in a more niche area, then it gets harder to use LLM all the time.
  - resonious12 days ago
    There is a nice medium between full-on vibe coding and doing it yourself by hand. Coding agents can be very effective on established codebases, and nobody is forcing you to push without reviewing.
  - 11 days ago
    undefined
- bogtog12 days ago
  > But now that most code is written by LLMs, it's as "hard" for the LLM to write Python as it is to write Rust/Go
  The LLM still benefits from the abstraction provided by Python (fewer tokens and less cognitive load). I could see a pipeline working where one model writes in Python or so, then another model is tasked to compile it into a more performant language
  - anonzzzies12 days ago
    It's very good (in our experience, YMMV of course) when/llm write prototype with python and then port automatically 1-1 to Rust for perf. We write prototypes in JS and Python and then it gets auto ported to Rust and we have been doing this for about 1 year for all our projects where it makes sense; in the past months it has been incredibly good with claude code; it is absolutely automatic; we run it in a loop until all (many handwritten in the original language) tests succeed.
    behnamoh12 days ago
    IDK what's going on in your shop but that sounds like a terrible idea!
    - Libraries don't necessarily map one-to-one from Python to Rust/etc.
    - Paradigms don't map neatly; Python is OO, Rust leans more towards FP.
    - Even if the code be re-written in Rust, it's probably not the most Rustic (?) approach or the most performant.
    anonzzzies12 days ago
    It doesn't map anything 1 to 1, it uses our guidelines and architecture for porting it which works well. I did say YMMV anyway; it works well for us.
    behnamoh12 days ago
    Sorry, so basically you're saying there are two separate guidelines, one for Python and one for Rust, and you have the LLM write it first in Python and then Rust. But I still don't understand why it would be any better than writing the code in Rust in one go? Why "priming" it in Python would improve the result in any way?
    Also, what happens when bug fixes are needed? Again first in Py and then in Rs?
    abrookewood12 days ago
    Why not get it to write it in Rust in the first place?
    antonvs11 days ago
    Presumably the thought experiment hasn’t matured to that point yet.
  - bko12 days ago
    I think that's not as beneficial as having proper type errors and feeding that into itself as it writes
    LudwigNagasena12 days ago
    Expressive linting seems more useful for that than lax typing without null safety.
  - JumpCrisscross12 days ago
    NP (as in P = NP) is also much lower for Python than Rust on the human side.
    behnamoh12 days ago
    What does that mean? Can you elaborate?
    JumpCrisscross12 days ago
    Sorry, yes. LLMs write code that's then checked by human reviewers. Maybe it will be checked less in the future. But I'm not seeing fully-autonomous AI on the horizon.
    At that point, the legibility and prevalence of humans who can read the code becomes almost more important than which language the machine "prefers."
    behnamoh12 days ago
    Well, verification is easier than creation (i.e., P ≠ NP). I think humans who can quickly verify something works will be in more demand than those who know how to write it. Even better: Since LLMs aren't as creative as humans (in-distribution thinking), test-writers will be in more demand (out-of-distribution thinkers). Both of these mean that humans will still be needed, but for other reasons.
    The future belongs to generalists!
    rvz12 days ago
    > The future belongs to generalists!
    Couldn't be more correct.
    The experienced generalists with techniques of verification testing are the winners [0] in this.
    But one thing you cannot do, is openly admit or to be found out to say something like: "I don't know a single line of Rust/Go/Typescript/$LANG code but I used an AI to do all of it" and the system breaks down and you can't fix it.
    It would be quite difficult to take a SWE seriously that prides themselves in having zero understanding and experience of building production systems and runs the risk of losing the company time and money.
    [0] https://news.ycombinator.com/item?id=46772520
    bandrami11 days ago
    I prefer my C compiler to write my asm for me from my C code but I can still (and sometimes have to!) read the asm it creates.
    Der_Einzige12 days ago
    P ≠ NP is NOT confirmed and my god I really do not want that to ever be confirmed
    I really do want to live in the world where P = NP and we can trivially get P time algorithms for believed to be NP problems.
    I reject your reality and substitute my own.
- condiment12 days ago
  100% of my LLM projects are written in Rust - and I have never personally written a single line of Rust. Compilation alone eliminates a number of 'category errors' with software - syntax, variable declaration, types, etc. It's why I've used Go for the majority of projects I've started the past ten years. But with Rust there is a second layer of guarantees that come from its design, around things like concurrency, nil pointers, data races, memory safety, and more.
  The fewer category errors a language or framework introduces, the more successful LLMs will be at interacting with it. Developers enjoy freedom and many ways to solve problems, but LLMs thrive in the presence of constraints. Frontiers here will be extensions of Rust or C-compatible languages that solve whole categories of issue through tedious language features, and especially build/deploy software that yields verifiable output and eliminates choice from the LLMs.
  - dotancohen12 days ago
    > ... and eliminates choice from the LLMs.
    Perl is right out! Maybe the LLMs could help us decipher extent Perl "write once, maintain never" code.
    nl12 days ago
    it's very good at this BTW
    trollbridge11 days ago
    I've found it's terrible at digesting a few codebases I've needed to deal with (to wit, 2007-era C# which used lots of libraries which were popular then, and 1993-era Visual Basic which also used from third party library that no LLM seems to understand the first thing about).
    simonw11 days ago
    I had great results recently with ~22 year old PHP: https://simonwillison.net/2025/Jul/1/mid-2000s/
    It even guessed the vintage correctly!
    > This appears to be a custom template system from the mid-2000s era, designed to separate presentation logic from PHP code while maintaining database connectivity for dynamic content generation.
    dotancohen11 days ago
    That's great. Just yesterday I spoke with a developer who refutes Rector on old codebases, instead having an LLM simply refactor his PHP 5.6 to 8.(3 I think). He doesn't even check in Rector anymore. These are all bespoke business scripts that his team have been nursing for two decades. He even updated the Codeigniter framework it's all running on.
    nl11 days ago
    I suspect the problem with VB is that VB 4 and 5 (which I think was that era) were so closely tied to the IDE it is difficult to work out what is going on without it.
    (I did Delphi back when VB6 was the other option so remember this problem well)
- bopbopbop712 days ago
  > But now that most code is written by LLMs
  Got anything to back up this wild statement?
  - RALaBarge12 days ago
    Depends, what to you would qualify as evidence?
    bopbopbop712 days ago
    Something quantitative and not "company with insane vested interest/hype blogger said so".
  - ecto12 days ago
    If you have to ask, you can't afford it.
  - dankwizard12 days ago
    Me, my team, and colleagues also in software dev are all vibe coding. It's so much faster.
    manishsharan12 days ago
    If I may ask, does the code produced by LLM follow best practices or patterns? What mental model do you use to understand or comprehend your codebase?
    Please know that I am asking as I am curious and do not intend to be disrespectful.
    DrewADesign11 days ago
    And what’s the name of the company? I’m fixing to harvest some bug bounties.
    mjevans12 days ago
    Think of the LLM as a slightly lossy compression algorithm fed by various pattern classifiers that weight and bin inputs and outputs.
    The user of the LLM provides a new input, which might or might not closely match the existing smudged together inputs to produce an output that's in the same general pattern as the outputs which would be expected among the training dataset.
    We aren't anywhere near general intelligence yet.
    antonvs11 days ago
    Ignoring your last line, which is poorly defined, this view contradicts observable reality. It can’t explain an LLM’s ability to diagnose bugs in code it hasn’t seen before, exhibit a functional understanding of code it hasn’t seen before, explain what it’s seeing and doing to a human user, etc.
    Functionally, on many suitably scoped tasks in areas like coding and mathematics, LLMs are already superintelligent relative to most humans - which may be part of why you’re having difficulty recognizing that.
    dankwizard11 days ago
    I get your sentiment but a lot of people on this forum forget that a lot of us are just working for the paycheck - I don't owe my company anything.
    Do I know the code base like the back of my hand? Nope. Can I confidently talk to how certain functions work? Not a chance.
    Can I deploy what the business wants? Yep. Can I throw error logs into LLMs and work out the cause of issues? Mostly.
    I get some of you may want to go above and beyond for your company and truly create something beautiful but then guess what - That codebase is theirs. They aren't your family. Get paid and move on
    tuwtuwtuwtuw11 days ago
    Do you work as a consultant then? I've been with the same employer for a long time, so if my team creates a mess, I get to look at it daily.
    username22311 days ago
    > It's so much faster.
    A lot of things are "so much faster" than the right thing. "Vibe traffic safety laws" are much faster than ones that increase actual traffic safety: http://propublica.org/article/trump-artificial-intelligence-... . You, your team, and colleagues are producing shiny trash at unbelievable velocity. Is that valuable?
    12 days ago
    undefined
  - myhf12 days ago
    I mean, people who use LLMs to crank out code are cranking it out by the millions of lines. Even if you have never seen it used toward a net positive result, you have to admit there is a LOT of it.
    halfcat12 days ago
    If all code is eventually tech debt, that sounds like a massive problem.
  - artursapek12 days ago
    [flagged]
- jacquesm12 days ago
  > But now that most code is written by LLMs
  Is this true? It seems to be a massive assumption.
  - embedding-shape12 days ago
    By lines of code produced in total? Probably true. By usefulness? Unclear.
  - e-dard12 days ago
    Replace _is_ with _can be_ and I think the general point still stands.
    fmbb12 days ago
    Sounds like just as big an assumption.
    jrflowers12 days ago
    Replacing “is” with “can be” is in practical terms the same thing as replacing “is” with “isn’t”
  - fooker12 days ago
    By lines of code, almost by an order of magnitude.
    Some of the code is janky garbage, but that’s what most code it. There’s no use pearl clutching.
    Human engineering time is better spent at figuring out which problems to solve than typing code token by token.
    Identifying what to work on, and why, is a great research skill to have and I’m glad we are getting to realistic technology to make that a baseline skill.
    jacquesm12 days ago
    Well, you will somehow have to turn that 'janky garbage' into quality code, who will do that then?
    tokioyoyo12 days ago
    You don't really have to.
    fooker12 days ago
    For most code, this never happens in the real world.
    The vast majority of code is garbage, and has been for several decades.
    pharrington12 days ago
    So we should all work to become better programmers! What I'm seeing now is too many people giving up and saying "most code is bad, so I may was well pump out even worse code MUCH faster." People are chasing convenience and getting a far worse quality of life in exchange.
    ben_w11 days ago
    I've seen all four quadrants of [good code, bad code] x [business success, business failure].
    The real money we used to get paid was for business success, not directly for code quality; the quality metrics we told ourselves were closer to CV-driven development than anything the people with the money understood let alone cared about, which in turn was why the term "technical debt" was coined as a way to try to get the leadership to care about what we care about.
    There's some domains where all that stuff we tell ourselves about quality, absolutely does matter… but then there's the 278th small restaurant that wants a website with a menu, opening hours, and table booking service without having e.g. 1500 American corporations showing up in the cookie consent message to provide analytics they don't need but are still automatically pre-packaged with the off-the-shelf solution.
    antonvs11 days ago
    I’ve seen those quadrants too, because I’ve come into several companies to help clean up a mess they’ve gotten into with bad code that they can no longer ignore. It is a compete certainty that we’re going to start seeing a lot more of that.
    One ironic thing about LLM-generated bad code is that churning out millions of lines just makes it less likely the LLM is going to be able to manage the results, because token capacity is neither unlimited nor free.
    (Note I’m not saying all LLM code is bad; but so far the fully vibecoded stuff seems bad at any nontrivial scale.)
    fooker11 days ago
    > because token capacity is neither unlimited nor free.
    This is like dissing software from 2004 because it used 2gb extra memory.
    In the last year, token context window increased by about 100x and halved in cost at the same time.
    If this is the crux of your argument, technology advancement will render it moot.
    antonvs11 days ago
    > In the last year, token context window increased by about 100x and halved in cost at the same time.
    So? It's nowhere close to solving the issue.
    I'm not anti-LLM. I'm very senior at a company that's had an AI-centric primary product since before the GPT explosion. But in order to navigate what's going on now, we need to understand the strengths and weaknesses of the technology currently, as well as what it's likely to be in the near, medium, and far future.
    The cost of LLMs dealing with their own generated multi-million LOC systems is very unlikely to become tractable in the near future, and possibly not even medium-term. Besides, no-one has yet demonstrated an LLM-based system for even achieving that, i.e. resolving the technical debt that it created.
    Don't let fanboism get in the way of rationality.
    fooker11 days ago
    > The cost of LLMs dealing with their own generated multi-million LOC systems is very unlikely to become tractable in the near future
    If you have a concrete way to pose this problem, you'll find that there will be concrete solutions.
    There is no way to demonstrate something as vague as "resolving the technical debt that it created".
    fooker12 days ago
    I disagree, most code is not worth improving.
    I would rather make N bad prototypes to understand the feasibility of solving N problems than trying to write beautiful code for one misguided problem which may turn out to be a dead end.
    There are a few orders of magnitude more problems worth solving than you can write good code for. Your time is your most important resource, writing needlessly robust code, checking for situations that your prototype will never encounter, just wastes time when it gets thrown away.
    A good analogy for this is how we built bridges in the Roman empire, versus how we do it now.
    pharrington11 days ago
    Have you ever been frustrated with software before? Has a computer program ever wasted your time by being buggy, obviously too slow or otherwise too resource intensive, having a poorly thought out interface, etc?
    fooker11 days ago
    Yes. I am, however, not willing to spend money to get it fixed.
    From the other side, the vast majority of customers will happily take the cheap/free/ad-supported buggy software. This is why we have all these random Google apps, for example.
    Take a look at the bug tracker of any large open source codebase, there will be a few tens of thousands of reported bugs. It is worse for closed corporate codebases. The economics to write good code or to get bugs fixed does not make sense until you have a paying customer complain loudly.
    bdangubic12 days ago
    This type of comments get downvoted the most on HN but it is absolute truth, most human-written code is “subpar” (trying to be nice and not say garbage). I have been working as a contractor for many years and code I’ve seen is just… hard to put it into words.
    so much discussion here on HN which critiques “vibe codes” etc implies that human would have written it better which is vast vast majority is simply not the case
    fooker12 days ago
    I have worked on some of the most supposedly reliable codebases on earth (compilers) for several decades, and most of the code in compilers is pretty bad.
    And most of the code the compiler is expected to compile, seen from the perspective of fixing bugs and issues with compilers, is absolutely terrible. And the day that can be rewritten or improved reliably with AI can't come fast enough.
    jacquesm11 days ago
    I honestly do not see how training AI on 'mountains of garbage' would have any other outcome than more garbage.
    I've seen lots of different codebases from the inside, some good some bad. As a rule smaller + small team = better and bigger + more participants = worse.
    fooker11 days ago
    The way it seems to work now is to task agents to write a good test suite. AI is much better at this than it is at writing code from scratch.
    Then you just let it iterate until tests pass. If you are not happy with the design, suggest a newer design and let it rip.
    All this is expensive and wasteful now, but stuff becoming 100-1000x cheaper has happened for every technology we have invented.
    jacquesm11 days ago
    Interesting, so this is effectively 'guided closed loop' software development with the testset as the control.
    It gives me a bit of a 'turtles all the way down' feeling because if the test set can be 'good' why couldn't the code be good as well?
    I'm quite wary of all of this, as you've probably gathered by now: the idea that you can toss a bunch of 'pass' tests into a box and then generate code until all of the tests pass is effectively a form of fuzzing, you've got some thing that passes your test set, but it may do a lot more than just that and your test set is not going to be able to exhaustively enumerate the negative cases.
    This could easily result in 'surprise functionality' that you did not anticipate during the specification phase. The only way to deal with that then is to audit the generated code, which I presume would then be farmed out to yet another LLM.
    This all places a very high degree of trust into a chain of untrusted components and that doesn't sit quite right with me. It probably means my understanding of this stuff is still off.
    fooker11 days ago
    You are right.
    What you are missing is that the thing driving this untrusted pile of hacks keep getting better at a rapid pace.
    So much that the quality of the output is passable now, mimicking man-years of software engineering in a matter of hours.
    If you don’t believe me, pick a project that you have always wanted to build from scratch and let cursor/claude code have a go at it. You get to make the key decisions, but the quality of work is pretty good now, so much that you don’t really have to double check much.
    jacquesm11 days ago
    Thank you, I will try that and see where it leads. This all suggests a massive downward adjustment for any capitalized software is on the menu.
    simonw11 days ago
    That's why the major AI labs are really careful about the code they include in the training runs.
    The days of indiscriminately scraping every scrap of code on the internet and pumping it all in are long gone, from what I can tell.
    jacquesm11 days ago
    Well, if as the OP points out it is 'all garbage' they don't have a whole lot of choice to discriminate.
    fooker11 days ago
    Do you have pointers to this?
    Would be a great resource to understand what works and what doesn't.
    simonw11 days ago
    Not really, sadly. It's more an intuition knocked up from following the space - the AI labs are still pretty secretive about their training mix.
    behnamoh12 days ago
    > who will do that then?
    the next version of LLMs. write with GPT 5.2 now, improve the quality using 5.3 in a couple months; best of both worlds.
- simonw12 days ago
  I have certainly become Go-curious thanks to coding agents - I have a medium sized side-project in progress using Go at the moment and it's been surprisingly smooth sailing considering I hardly know the language.
  The Go standard library is a particularly good fit for building network services and web proxies, which fits this project perfectly.
  - logicprog12 days ago
    It's funny seeing you say that, because I've had an entire arc of despising the design of, and peremptorily refusing to use, Go, to really enjoying it, thanks to AI coding agents being able to take care of the boilerplate for me.
    It turns out that verbosity isn't really a problem when LLMs are the one writing the code based on more high level markdown specs (describing logic, architecture, algorithms, concurrency, etc), and Go's extreme simplicity, small range of language constructs, and explicitness (especially in error handling and control flow) make it much easier to quickly and accurately review agent code.
    It also means that Go's incredible (IMO) runtime, toolchain, and standard library are no longer marred by the boilerplate either, and I can begin to really appreciate their brilliance. It has me really reconsidering a lot of what I believed about language design.
    simonw12 days ago
    Yeah, I much prefer Go to Rust for LLM things because I find Go code easy to read and understand despite having little experience with it - Rust syntax still trips me up.
    logicprog12 days ago
    Not to mention that, in general, there's a lot more to keep in mind with Rust.
    I've written probably tens of thousands of lines of Rust at this point, and while I used to absolutely adore it, I've really completely fallen out of love with it, and part of it is that it's not just the syntax that's horrible to look at (which I only realized after spending some time with Go and Python), but you have to always keep in mind a lot of things:
    - the borrow checker - lifetimes, - all the different kinds of types that represent different ways of doing memory management - parse out sometimes extremely complex and nearly point-free iterator chaining - deal with a complex type system that can become very unwieldy if you're not careful - and more I'm probably not thinking of right now
    Not to mention the way the standard library exposes you to the full bore of all the platform-specific complexities it's designed on top of, and forces you to deal with them, instead of exposing a best-effort POSIX-like unified interface, so path and file handling can be hellish. (this is basically the reverse of fasterthanlime's point in the famous "I want off mr. golang's wild ride" essay).
    It's just a lot more cognitive overhead to just getting something done if all you want is a fast statically compiled, modern programming language. And it makes it even harder to review code. People complain about Go boilerplate, but really, IME, Rust boilerplate is far, far worse.
    rednafi12 days ago
    This resonates with me too. I’ve written some Rust and a lot of Go. I find Rust syntax distastefully ugly, and the sluggish compilation speed doesn’t bring me any joy.
    On top of that, Go has pretty much replaced my Python usage for scripting since it’s cheap to generate code and let the compiler catch obvious issues. Iteration in Rust is a lot slower, even with LLMs.
    I get fasterthanlime’s rant against Go, but none of those criticisms apply to me. I write distributed-systems code for work where Go absolutely shines. I need fast compilation, self-contained binaries, and easy concurrency support. Also, the garbage collector lets me ignore things I genuinely couldn’t care less about - stuff Rust is generally good at. So choosing Go instead of Rust was kinda easy.
    mleo11 days ago
    Just completed my first, small go program. It is just a cli tool to use with code quality tool for coding agent skill. The toolchain built into go left a good first impression. Recursion and refinement of guard rails on coding agents has been high on my priorities to deliver better quality code faster.
    vips7L11 days ago
    God you people are so lazy.
    logicprog11 days ago
    Unnecessarily doing extra work is not a virtue. Leave the Catholicism behind. I'm not using AI to replace proglem solving, thinking through and understanding the problem and then figuring out how to fix it, the systems thinking, design, architecture, algorithms, domain modelling, etc. I'm just not dealing with the BS "what was the order of the arguments this function took again? What's the library API for this?" stuff and writing boiler-plate or managing typechecker-driven refactors. The question is whether what you make is any good, and I still spend a lot of time making sure what I built made sense, is well factored and DRY, and is as elegant as I know how to make it. In fact, with the increased leverage LLMs give me, I've found myself spending more time on code quality and testing than I used to!
  - Imustaskforhelp12 days ago
    100% check out Golang even more! I have been writing Golang AI coding projects for a really long time because I really loved writing different languages and Golang was one in which I settled on.
    Golang's libraries are phenomenal & the idea of porting over to multiple servers is pretty easy, its really portable.
    I actually find Golang good for CLI projects, Web projects and just about everything.
    Usually the only time I still use python uvx or vibe code using that is probably when I am either manipulating images or pdf's or building a really minimalist tkinkter UI in python/uv
    Although I tried to convert the python to golang code which ended up using fyne for gui projects and surprisingly was super robust but I might still use python in some niche use cases.
    Check out my other comment in here for finding a vibe coded project written in a single prompt when gemini 3 pro was launched in the web (I hope its not promotion because its open source/0 telemetry because I didn't ask for any of it to be added haha!)
    Golang is love. Golang is life.
  - behnamoh12 days ago
    > considering I hardly know the language.
    Same boat! In fact I used to (still do) dislike Go's syntax and error handling (the same 4 lines repeated every time you call a function), but given that LLMs can write the code and do the cross-model review for me, I literally don't even see the Go source code, which is nice because I'd hate it if I did (my dislike of Go's syntax + all the AI slop in the code would drive me nuts).
    But at the end of the day, Go has good scaffolding, the best tooling (maybe on par with Rust's, definitely better than Python even with uv), and tons of training data for LLMs. It's also a rather simple language, unlike Swift (which I wish was simpler because it's a really nice language otherwise).
- nomel12 days ago
  > But now that most code is written by LLMs
  I'm sure it will eventually be true, but this seems very unlikely right now. I wish it were true, because we're in a time where generic software developers are still paid well, so doing nothing all day, with this salary, would be very welcome!
  - phainopepla212 days ago
    Code written by LLM != developer doing nothing
  - 12 days ago
    undefined
- kenjackson12 days ago
  Has anyone tried creating a language that would be good for LLMs? I feel like what would be good for LLMs might not be the same thing that is good for humans (but I have no evidence or data to support this, just a hunch).
  - Sheeny9612 days ago
    The problem with this is the reason LLMs are so good at writing Python/Java/JavaScript is that they've been trained on a metric ton of code in those languages, have seen the good the bad and the ugly and been tuned to the good. A new language would be training from scratch and if we're introducing new paradigms that are 'good for LLMs but bad for humans' means humans will struggle to write good code in it, making the training process harder. Even worse, say you get a year and 500 features into that repo and the LLM starts going rogue - who's gonna debug that?
    reitzensteinm12 days ago
    But coding is largely trained on synthetic data.
    For example, Claude can fluently generate Bevy code as of the training cutoff date, and there's no way there's enough training data on the web to explain this. There's an agent somewhere in a compile test loop generating Bevy examples.
    A custom LLM language could have fine grained fuzzing, mocking, concurrent calling, memoization and other features that allow LLMs to generate and debug synthetic code more effectively.
    If that works, there's a pathway to a novel language having higher quality training data than even Python.
    mbreese11 days ago
    I recently had Codex convert an script of mine from bash to a custom, Make inspired language for HPC work (think nextflow, but an actual language). The bash script submitted a bunch of jobs based on some inputs. I wanted this converted to use my pipeline language instead.
    I wrote this custom language. It's on Github, but the example code that would have been available would be very limited.
    I gave it two inputs -- the original bash script and an example of my pipeline language (unrelated jobs).
    The code it gave me was syntactically correct, and was really close to the final version. I didn't have to edit very much to get the code exactly where I wanted it.
    This is to say -- if a novel language is somewhat similar to an existing syntax, the LLM will be surprisingly good at writing it.
  - voxleone12 days ago
    >Has anyone tried creating a language that would be good for LLMs?
    I’ve thought about this and arrived at a rough sketch.
    The first principle is that models like ChatGPT do not execute programs; they transform context. Because of that, a language designed specifically for LLMs would likely not be imperative (do X, then Y), state-mutating, or instruction-step driven. Instead, it would be declarative and context-transforming, with its primary operation being the propagation of semantic constraints. The core abstraction in such a language would be the context, not the variable. In conventional programming languages, variables hold values and functions map inputs to outputs. In a ChatGPT-native language, the context itself would be the primary object, continuously reshaped by constraints. The atomic unit would therefore be a semantic constraint, not a value or instruction.
    An important consequence of this is that types would be semantic rather than numeric or structural. Instead of types like number, string, bool, you might have types such as explanation, argument, analogy, counterexample, formal_definition.
    These types would constrain what kind of text may follow, rather than how data is stored or laid out in memory. In other words, the language would shape meaning and allowable continuations, not execution paths. An example:
    @iterate: refine explanation until clarity ≥ expert_threshold
  - koolba12 days ago
    There are two separate needs here. One is a language that can be used for computation where the code will be discarded. Only the output of the program matters. And the other is a language that will be eventually read or validated by humans.
  - branafter12 days ago
    Most programming languages are great for LLMs. The problem is with the natural language specification for architectures and tasks. https://brannn.github.io/simplex/
  - simonw12 days ago
    There was an interesting effort in that direction the other day: https://simonwillison.net/2026/Jan/19/nanolang/
  - conception12 days ago
    I don’t know rust but I use it with llms a lot as unlike python, it has fewer ways to do things, along with all the built in checks to build.
  - 99990000099912 days ago
    I want to create a language that allows an LLM to dynamically decide what to do.
    A non dertermistic programing language, which options to drop down into JavaScript or even C if you need to specify certain behaviors.
    I'd need to be much better at this though.
    branafter12 days ago
    You're describing a multi-agent long horizon workflow that can be accomplished with any programming language we have today.
    99990000099912 days ago
    I'm always open to learning, are there any example projects doing this ?
    branafter12 days ago
    The most accessible way to start experimenting would be the Ralph loop: https://github.com/anthropics/claude-code/tree/main/plugins/...
    You could also work backwards from this paper: https://arxiv.org/abs/2512.18470
    99990000099912 days ago
    Ok.
    I'm imagining something like.
    "Hi Ralph, I've already coded a function called GetWeather in JS, it returns weather data in JSON can you build a UI around it. Adjust the UI overtime"
    At runtime modify the application with improvements, say all of a sudden we're getting air quality data in the JSON tool, the Ralph loop will notice, and update the application.
    The Arxiv paper is cool, but I don't think I can realistically build this solo. It's more of a project for a full team.
    fwip12 days ago
    yes "now what?" | llm-of-choice
    gregoryl12 days ago
    What does that even mean?
- rednafi12 days ago
  I agree with this. Making languages geared toward human ergonomics probably won’t be a thing going forward.
  Go is positioned really well here, and Steve Yegge wrote a piece on why. The language is fast, less bloated than Python/TS, and less dogmatic than Java/Kotlin. LLMs can go wham with Go and the compiler will catch most of the obvious bugs. Faster compilation means you can iterate through a process pretty quickly.
  Also, if I need abstraction that’s hard to achieve in Go, then it better be zero-cost like Rust. I don’t write Python for anything these days. I mean, why bother with uv, pip, ty, mypy, ruff, black, and whatever else when the Go compiler and the standard tooling work better than that decrepit Python tooling? And it costs almost nothing to make my scripts faster too.
  I don’t yet know how I feel about Rust since LLMs still aren’t super good with it, but with Go, agentic coding is far more pleasurable and safer than Python/TS.
  - dotancohen12 days ago
    Python (with Qt, pyside) is still great for desktop GUI applications. My current project is all LLM generated (but mostly me-verified) Rust, wrapped in a thin Python application for the GUI, TUI, CLI, and web interfaces. There's also a Kotlin wrapper for running it on Android.
    rednafi12 days ago
    Yeah, Python is nice to work with in many contexts for sure. I mostly meant that I don’t personally use it as much anymore, since Go can do everything I need, and faster.
    Plus the JS/Python dependency ecosystem is tiring. Yeah, I know there’s uv now, but even then I don’t see much reason to suffer through that when opting for an actually type-safe language costs me almost nothing.
    Dynamic languages won’t go anywhere, but Go/Rust will eat up a pretty big chunk of the pie.
- sakesun12 days ago
  LLM should generate to terse and easy to read language for human to review. Beside Python, F# can be a perfect fit.
  - Jhater12 days ago
    [dead]
- shevy-java12 days ago
  > Python/JS/Ruby/etc. were good tradeoffs when developer time mattered.
  First I don't think this is the end of those languages. I still write code in Ruby almost daily, mostly to solve smaller issues; Ruby acts as the ultimate glue that connects everything here.
  Having said that, Ruby is on a path to extinction. That started way before AI though and has many different reasons; it happened to perl before and now ruby is following suit. Lack of trust in RubyCentral as our divine new ruler is one (recently), after they decided to turn against the community. Soon Ruby can be renamed into Suby, to indicate Shopify running the show now. What is interesting is that you still see articles "ruby is not dead, ruby is not dead". Just the frequency of those articles coming up is worrying - it's like someone trying to pitch last minute sales - and then the company goes bankrupt. The human mind is a strange thing.
  One good advantage of e. g. Python and Ruby is that they are excellent at prototyping ideas into code. That part won't go away, even if AI infiltrates more computers.
  - the_af12 days ago
    > One good advantage of e. g. Python and Ruby is that they are excellent at prototyping ideas into code. That part won't go away, even if AI infiltrates more computers.
    Why wouldn't they go away for prototyping? If an LLM can help you prototype in whatever language, why pick Ruby or Python?
    (This isn't a gotcha question. I primarily use python these days, but I'm not married to it).
- felixgallo12 days ago
  I wouldn't speak so quickly for the 'uncommon' language set. I had Claude write me a fully functional typed erlang compiler with ocaml and LLVM IR over the last two days to test some ideas. I don't know ocaml. It made the right calls about erlang, and the result passes a fairly serious test suite, so it must've known enough ocaml and LLVM IR.
- jdub12 days ago
  > But now that most code is written by LLMs...
  Pause for a moment and think through a realistic estimation of the numbers and proportions involved.
- threecheese12 days ago
  My intuition from using the tools broadly is that pre-baked design decisions/“architectures” are going to be very competitive on the LLM coding front. If this is accurate, language matters less than abstraction.
  Instructions files are just pre-made decisions that steer the agent. We try to reduce the surface area for nondeterminism using these specs, and while the models will get better at synthesizing instructions and code understanding, every decision we remove pays dividends in reduced token usage/time/incorrectness.
  I think this is what orgs like Supabase see, and are trying to position themselves as solutions to data storage, auth, events etc within the LLM coding space, and are very successful albeit in the vibe coder area mostly. And look at AWS Bedrock, they’ve abstracted every dimension of the space into some acronym.
- ravenstine12 days ago
  I'm not sure that LLMs are going to [completely] replace the desire for JIT, even with relatively fast compilers.
  Frameworks might go the way of the dinosaur. If an LLM can manage a lot of complex code without human-serving abstractions, why even use something like React?
  - mdtusz12 days ago
    Frameworks aren't just human-serving abstractions - they're structural abstractions that allow for performant code, or even being able to achieve certain behaviours.
    Sure, you could write a frontend without something like react, and create a backend without something like django, but the code generated by an LLM will become similarly convoluted and hard to maintain as if a human had written it.
    LLM's are still _quite_ bad at writing maintainable code - even for themselves.
  - westurner12 days ago
    Test cases; test coverage
- cyanydeez12 days ago
  I think you're missing the reason LLMs work: It's cause they can continue predictable structures, like a human.
  The surmise that compiled languages fit that just doesn't follow. The same way LLMs have trouble finishing HTML because of the open/close are too far apart.
  The language that an LLM would succeed with is one where:
  1. Context is not far apart
  2. The training corpus is wide
  3. Keywords, variables, etc are differentiated in the training.
  4. REPL like interactivity allows for a feedback loop.
  So, I think it's premature to think just because the compiled languages are less used because of human inabilities, doesn't mean the LLM will do any better.
- cobolexpert12 days ago
  I was also thinking this some days ago. The scaffolding that static languages provide is a good fit for LLMs in general.
  Interestingly, since we are talking about Go specifically, I never found that I was spending too much typing... types. Obviously more than with a Python script, but never at a level where I would consider it a problem. And now with newer Python projects using type annotations, the difference got smaller.
  - zahlman12 days ago
    > And now with newer Python projects using type annotations, the difference got smaller.
    Just FWIW, you don't actually have to put type annotations in your own code in order to use annotated libraries.
    cobolexpert12 days ago
    Indeed, but nowadays it’s common to add the annotations to claw back a bit of more powerful code linting.
- adw12 days ago
  The quality of the error messages matters a _lot_ (agents read those too!) and Python is particularly good there.
  - simonw12 days ago
    Especially since Python 3.14 shipped big improvements to error messages: https://docs.python.org/3/whatsnew/3.14.html#whatsnew314-imp...
- c7b12 days ago
  Agree on compiled languages, wondering about Go vs Rust. Go compiles faster but is more verbose, token cost is an important factor. Rust's famously strict compiler and general safety orientation seems like a strong candidate for LLM coding. Go would probably have more training data out already though.
- trollbridge11 days ago
  I generally use LLMs to generate Python (or TypeScript) because the quality and maintainability is significantly better than if I ask it to, for example, pump out C. They really do not perform very well outside of the most "popular" languages.
- bstar7711 days ago
  I’ve moved to rust for some select projects and it’s actually been a bit easier… I converted an electron app to rust/tauri… perf improvement was massive and development was quicker. I’m rethinking the stacks I should be focused on.
- justaboutanyone11 days ago
  We may as well have the LLMs use the hardest most provably-correct language possible
- bitwize12 days ago
  Astronaut 1: You mean... strong static typing is an unmitigated win?
  Astronaut 2: Always has been...
- tshaddox11 days ago
  Might as well choose a language with a much better type system than go, given how beneficial quick feedback loops are to LLM code generation.
- al_borland12 days ago
  > assuming enough training data
  This is a big assumption. I write a lot of Ansible, and it can’t even format the code properly, which is a pretty big deal in yaml. It’s totally brain dead.
  - simonw12 days ago
    Have you tried telling it to run a script to verify that the YAML is valid? I imagine it could do that with Python.
    al_borland11 days ago
    It gets it wrong 100% of the time. A script to validate would send it into an infinite loop of generating code and failing validation.
    simonw11 days ago
    Are you sure about that?
    I don't think I've ever seen Opus 4.5 or GPT-5.2 get stuck in a loop like that. They're both very good at spotting when something doesn't work and trying something else instead.
    Might be a problem with older, weaker models I guess.
    al_borland11 days ago
    I’m limited on the tools and models I can use due to privacy restrictions at work.
- deadbabe12 days ago
  Peak LLM will be when we can give some prompt and just get fully compiled binaries of programs to download, no code at all.
  - lovecg11 days ago
    Claude code, not too surprisingly, can do that (on a toy example).
    deadbabe11 days ago
    toys are for children
- ekianjo11 days ago
  Still less tokens to produce with higher level languages, and therefore less cost to maintain in the long run?
- resonious12 days ago
  > LLMs still can't write Gleam
  Have you tried? I've had surprisingly good results with Gleam.
- tyingq12 days ago
  If you asked the LLM it's possible it would tell you Java is a better fit.
- zahlman12 days ago
  People are still going to want to audit the code, at the very least.
- Imustaskforhelp12 days ago
  I love golang man! And I use it for the same thing too!!
  I mean people mention rust and everything and how AI can write proper rust code with linter and some other thing but man trust me that AI can write some pretty good golang code.
  I mean though, I don't want everyone to write golang code with AI of all of a sudden because I have been doing it for over an year and its something that I vibe with and its my personal style. I would lose some points of uniqueness if everyone starts doing the same haha!
  Man my love for golang runs deep. Its simple, cross platform (usually) and compiles super fast. I "vibe code" but feel faith that I can always manage the code back.
  (self promotion? sorry about that: but created golang single main.go file project with a timer/pomodoro with websockets using gorilla (single dep) https://spocklet-pomodo.hf.space/)
  So Shhh let's keep it a secret between us shall we! ;)
  (Oh yeah! Recently created a WHMCS alternative written in golang to hook up to any podman/gvisor instance to build your own mini vps with my own tmate server, lots of glue code but it actually generated it in first try! It's surprisingly good, I will try to release it as open source & thinking of charging just once if people want everything set up or something custom
  Though one minor nitpick is that the complexity almost rises many folds between a single file project and anything which requires database in golang from what I feel usually but golang's pretty simple and I just LOVE golang.)
  Also AI's pretty good at niche languages too I tried to vibe code a fzf alternative from golang to v-lang and I found the results to be really promising too!
- paulddraper11 days ago
  Agreed. The compiler is a feedback cycle made in heaven.
- dec0dedab0de12 days ago
  or maybe someone will use an LLM to create a JIT that works so well that compiled languages will be gone.
- lsh012 days ago
  > LLMs still can't write Gleam/Janet/CommonLisp/etc
  hoho - I did a 20/80 human/claude project over the long weekend using Janet: https://git.sr.ht/~lsh-0/pj/tree (dead simple Lerna replacement)
  ... but I otherwise agree with the sentiment. Go code is so simple it scrubs any creative fingerprints anyway. The Clojure/Janet/scheme code I've seen it writing isn't _great_ but it gets the job done quickly and correct enough for me to return to it later and golf it some.
- Jhater12 days ago
  [dead]
- rvz12 days ago
  > Plus, I get a binary that I can port to other systems w/o problem.
  So cross-platform vibe-coded malware is the future then?
  - yibers12 days ago
    I hope that AVs will also evolve using the new AI tech to detect this type of malware.
    Imustaskforhelp12 days ago
    Honestly I looked at Go for malware and I mean AV detection for golang used to be ehh but recently It got strong.
    Then it became a cat and mouse game with obfuscators and deobfucsators.
    John Hammond has a *BRILLIANT* Video on this topic. 100% recommneded.
    Honestly Speaking from John Hammond I feel like Nim as a language or V-lang is something which will probably get vibe coded malware from. Nim has been used for hacking so much that iirc windows actually blocked the nim compiler as malware itself!
    Nim's biggest issue is that hackers don't know it but if LLM's fix it. Nim becomes a really lucrative language for hackers & John Hammond described that Nim's libraries for hacking are still very decent.
sheepscreek12 days ago
Nice work detective Simon! I love these “discovery” posts the most because you can’t find this stuff anywhere.
- go_photon_go12 days ago
  Absolutely, when people discover and share there's something fun to it beyond press releases and commentary. Creative and inspiring post
0xbadcafebee11 days ago
This is basically the same functionality as OpenAI Codex Web has, which, if you've not used it, you absolutely should not. What a garbage piece of software. Anthropic is eating OpenAI's lunch.
- simonw11 days ago
  It's a bit different from Codex Web in that it can't open PRs against projects and can't be configured with internet access.
  It is better than Codex Web in that you can continue to chat with the agent while it's working - Claude Code for web has that too. Codex Web really needs to catch up there!
  - 0xbadcafebee11 days ago
    Codex Web actually lacks the most basic PR integration, it's so useless. Codex Web refuses to push any binary file to your PR (like images, jars, lock files, etc). It can't check your GH Actions' logs for failures to try to fix them. Replying to one of the PR comments to accept a fix requires replying to a different GitHub bot than the one that opens your PR. And though there's a "Secrets" configuration to add secret vars for a Codex repo, Codex can't access them, so you can't even work around these bugs by asking Codex to make API calls. It's like nobody at the company has tried their own product.
randomtoast12 days ago
Maybe soon we have single use applications. Where ChatGPT can write an App for you on-the-fly in a cloud sandbox you interact with it in the browser and fulfill your goal and afterwards the App is shutdown and thrown away.
- Imustaskforhelp12 days ago
  exe.dev (though there are alternatives like sprites.dev etc. too)
- twostorytower12 days ago
  You can already do this.
- idontwantthis12 days ago
  [flagged]
Fernicia12 days ago
Has Gemini lost its ability to run javascript and python? I swear it could when it was launched by now its saying it hasn't the ability. Annoying regression when Claude and ChatGPT are so good at it.
- tj800x12 days ago
  This regression seems to have happened in the past few days. I suspected it was hallucinating the run and confirmed it by by asking Gemini to output the current date/time. The UTC it was reported was in the future from my clock. Some challenging mathematics were generating wrong results. Gemini will acknowledge something is wrong if you push it to explain the discrepancies, but can't explain it.
jmacd12 days ago
I wonder how long npm/pip etc even makes sense.
Dependancies introduce unnecessary LOC and features which are, more and more, just written by LLMs themselves. It is easier to just write the necessary functionality directly. Whether that is more maintainable or not is a bit YMMV at this stage, but I would wager it is improving.
- physicsguy11 days ago
  What a bizarre comment. Take something like NumPy - has a hard dependency on BLAS implementations where numerical correctness are highly valued for accuracy and require deep thinking for correct implementation as well as for performance. Written in a different language again for performance so again an LLM would have to implement all of those things. What’s the utility in burning energy to regenerate this all the time when implementations already exist?
  - 11 days ago
    undefined
  - hluska11 days ago
    What do supply chain attacks look like against one of these containers?
- ford11 days ago
  Interesting thought (I think recently more than ever it's a good idea to question assumptions) - but IMO abstractions are important as ever.
  Maybe the smallest/most convenient packages (looking at you is-even) are obsolete, but meaningful packages still abstract a lot of complexity that IMO aren't easier to one-shot with an LLM
  - whazor11 days ago
    Concretely, when you use Django, underneath you have CPython, then C, then assembly, and finally machine code. I believe LLMs have been much better trained on each layer than going end-to-end.
- unixhero11 days ago
  The most popular modules downloaded off pip and npm are not singular simple functions and cannot easily be rewritten by an llm.
  Scikit-learn
  Pandas
  Polars
- TZubiri12 days ago
  This is like saying Wikipedia doesn't make sense because there's now Grokipedia
  - GuinansEyebrows12 days ago
    there are people (on Hacker News Dot Com, even) who believe this without a shred of shame or irony.
- fendy300211 days ago
  I consider packages over 100k download production-tested. Sure LLM can roll some by themselves but if many edge cases to appear, (which may already be handled by public packages) you will need to handle it.
  - embedding-shape11 days ago
    Don't base anything on just download numbers, not only is it easily game-able, it's enough with like 3 small companies using a package and push commits individually and CI triggering on every new commit for that number to lose any sort of meaning.
    Vanity metrics should not be used for engineering decisions.
- kristianp12 days ago
  At times I wonder why x tui coding agent was written in js/ts/python, why not use Go if it's mostly llm coded anyway? But that's mostly my frustration at having to wait for npm to install a thousand dependencies, instead of one executable plus some config files. There's also support libraries like terminal ui that differ in quality between platforms.
  - hdjrudni11 days ago
    Funny because as a non-Go user, the few Go binaries I've used also installed a bunch of random stuff.
    This can be fixed in npm if you publish pre-compiled binaries but that has its own problems.
    zenmac11 days ago
    >the few Go binaries I've used also installed a bunch of random stuff.
    Same goes for rust. Sometime one package implicitly imports other in different version. And look of rustup tree to resolve the issue just doesn't seem very appealing.
- sersi11 days ago
  Well you do need to vet dependencies and I wish there was a way to exclude purely vibe coded dependencies that no human reviewed but for well established libraries, I do trust well maintained and designed human developed libraries over AI slop.
  Don't get me wrong, I'm not a luddite, I use claude code and cursor but the code generated by either of those is nowhere near what I'd call good maintainable code and I end up having to rewrite/refactor a big portion before it's in any halfway decent state.
  That said with the most egregious packages like left-pad etc in nodejs world it was always a better idea to build your own instead of depending on that.
  - hdjrudni11 days ago
    I've been copy-pasting small modules directly into my projects. That way I can look them over and see if they're OK and it saves me an install and possible future npm-jacking. There's a whole ton of small things that rarely need any maintenance, and if they do, they're small enough that I can fix myself. Worst case I paste in the new version (I press 'y' on github and paste the link at the top of the file so I can find it again)
- baby_souffle12 days ago
  As long as "don't roll your own crypto" is considered good advice, you'll have at least a few packages/libraries that'll need managing.
  For a decent number of relatively pedestrian tasks though, I can see it.
  - emj11 days ago
    LLMs are great at the roll you own crypto foot gun. They will tell you to remember all these things that are important, and then ignore their own tips.
- empath7511 days ago
  Tokens are expensive and downloading is cheap. I think probably the opposite is true, really, and more packages will be written specifically for LLMs to use because their api uses fewer tokens.
- jimbokun11 days ago
  It still takes a little bit of time for an LLM to rewrite all the software in existence from scratch.
- throwaway202711 days ago
  That was already the case for a lot of things like is-even.
- PunchyHamster11 days ago
  You have insane delusions about how capable LLMs are but even assuming its somehow true: downloading deps instead of hallucinating more code saves you on tokens
  - hluska11 days ago
    And your opinions on how average people use these tools are 100% accurate?
    antonvs11 days ago
    If average people try vibecoding their dependencies, they’ll fail, simple as that. We’ve already seen how that looks with the “web browsers” that have recently been vibecoded.
    simonw11 days ago
    There's a new web browser project today that's a heck of a lot more impressive than the previous ones - ~20,000 lines of dependency-free Rust (though it uses system libraries for image and text rendering), does a good job of the Hacker News homepage: https://news.ycombinator.com/item?id=46779522
    antonvs11 days ago
    Thanks for the heads up, that does look much more interesting.
    I don't think it really affects the point discussed above for now, because we were discussing average users, and by definition, the first person to code a plausible web browser with an agent isn't an average user - unless of course that can be reliably replicated with any average user.
    But on that note, the takeaways on the post you linked are relevant, because the author bucked a few trends to do this, and concluded among other things that "The human who drives the agent might matter more than how the agents work and are set up, the judge is still out on this one."
    This will obviously change, but the areas that LLMs need to improve on here are ones they're notoriously weak on, so it could take a while.
    PunchyHamster11 days ago
    at least 5% more accurate than average LLM
- letsgethigh12 days ago
  best to write assembly instead.
e12e11 days ago
Hmm.. what's this?
> gmail (read-only) # gmail.search_email_ids → any # > Description: Search Gmail message IDs by query/tags (read-only).
Chat GPT App on android disavows having this... In what context does chat GPT get (read) access to Gmail? Desktop app?
- simonw11 days ago
  I got that from the https://chatgpt.com/ web app - I just tried the same prompt in the ChatGPT iPhone app and got the gmail. and gcal. ones too: https://chatgpt.com/share/6978dc20-8a70-8006-9b42-6c0a8080be...
  Looks like it's for this feature: https://mashable.com/article/chatgpt-5-openai-gmail-calendar
  Presumably you have to opt-in to turning this on somewhere.
  - fassssst11 days ago
    You can turn those on in the “Apps” settings page. They were previously called Connectors.
    e12e11 days ago
    Hm... I have a Google drive app [Ed: available for install] - but no Gmail/calendar. Maybe because I'm in Europe?
    (Not that I'm super excited about giving chat GPT my emails - on the other hand Gemini is already there (Gmail) I guess...).
  - e12e11 days ago
    Strange. When logged in to a free account on Android:
    https://chatgpt.com/share/697902ad-3070-800d-b523-0fe312c772...
hulitu11 days ago
> ChatGPT Containers can now run bash, pip/npm install packages and download files
What can go wrong ? The next Linux (and BSD) worm will be a ChatGPT based one.
skybrian12 days ago
Not sure if this is still working. I tried getting it to install cowsay and it ran into authentication issues. Does it work for other people?
- aryehof11 days ago
  I could even get it to download the ruby cowsay gem from rubygems and run it with some provided text. An alternative is to attach the gem to the conversation or provide a publicly available url.
- simonw12 days ago
  Can you share the transcript?
  - skybrian12 days ago
    https://chatgpt.com/share/6977f9d7-ca94-8000-b1a0-8b1a994e58...
    The transcript doesn't show it (I think it faked it) but here's the code in the sidebar:
    > bash -lc mkdir -p /mnt/data/cowsay-demo && cd /mnt/data/cowsay-demo && npm init -y >/dev/null && npm i cowsay@latest >/dev/null && echo 'Installed cowsay version:' && node -e "console.log(require('cowsay/package.json').version)"
    npm error code E401 npm error Incorrect or missing password. npm error If you were trying to login, change your password, create an npm error authentication token or enable two-factor authentication then npm error that means you likely typed your password in incorrectly. npm error Please try again, or recover your password at: npm error https://www.npmjs.com/forgot npm error npm error If you were doing some other operation then your saved credentials are npm error probably out of date. To correct this please try logging in again with: npm error npm login npm error A complete log of this run can be found in: /home/oai/.npm/_logs/2026-01-26T21_20_00_322Z-debug-0.log
    > Checking and overriding npm registry > It seems like the registry option is protected, possibly pointing to an internal OpenAI registry that requires authentication. To bypass this, I can override the registry in the command with npm i cowsay --registry=https://registry.npmjs.org/. Let's give this a try and see if it works.
    It's unclear if that helped.
    I tried again and it worked. It seems like I have to ask for it to do things "in the container" or it will just give me directions about how to do it.
    simonw12 days ago
    OK that's really weird. Intermittent environment bug perhaps?
carterschonwald12 days ago
but… will gpt still get confused by the ellippses that its document viewer ui hack adds? probably yes.
xnx12 days ago
How much compute do you get in these containers? Could I have it run whisper on an mp3 it downloads?
- simonw12 days ago
  That might work! You would have to figure out how to get Whisper working in there but I'm sure that's possible with a bit of creativity concerning uploading files and maybe running a build with the available C compiler.
  It appears to have 4GB of RAM and 56 (!?) CPU cores https://chatgpt.com/share/6977e1f8-0f94-8006-9973-e9fab6d244...
  - Imustaskforhelp12 days ago
    Huh...
    If people are getting this for free or even as an offering with chatgpt consideirng it becomes subsidized too. Lowend providers are a little in threat with their 7$/year deals if Chatgpt provides 56 cores for free. this doesn't seem right to provide so many cores for (free??)
    Are you running this in your free account as you mention in blog post simon or in your paid account?
    simonw12 days ago
    My $20/month paid account.
    I used a free account to check if the feature was available there and it tried to get me to upgrade two prompts in (just enough for me to confirm the container worked and could install packages).
    Imustaskforhelp12 days ago
    Oh thanks for your reply Simon!
    > I used a free account to check if the feature was available there and it tried to get me to upgrade two prompts in (just enough for me to confirm the container worked and could install packages).
    Wait it tried... to make you upgrade your chatgpt account from free to paid account? Sorry I didn't get what you meant here
    (Funnily I asked chatgpt about what it thinks of your text and it says that It thinks that it tries to ask you to pay up)
    Is this thing (maybe some additions to make it like sprites.dev?) + some ad features for basic query gonna be how openAI Monetizes?
    I mean I am part of lowend community (so indie community of hosting providers) and they are all really pissed and some shutting down because of ram prices increases. OpenAI has all the ram in the world right now so is it trying to be a monopoly in this instance?
    I just found it to be really dystopian that it asked you to pay. Can you share me a pic of it if possible or share the free conversation. Heck, I might have to try it now on my free account as well.
    Curiosity's piqued right now.
    simonw12 days ago
    On my free ChatGPT account I ran a prompt telling it to write and execute hello world in a bunch of languages: https://chatgpt.com/share/6977aa7c-7bd8-8006-8129-8c9e25126f...
    It did what I asked - proving that the container feature works even for free accounts - but then displayed a message saying that I was as out of free prompts and would need to upgrade or wait before I could run more.
    storystarling11 days ago
    You are likely just seeing the host topology. Even if the container reports 56 cores, the actual compute is almost certainly throttled via cgroups to keep the unit economics viable. I would be surprised if you can sustain more than a fraction of a vCPU before hitting a hard quota.
    goinghjuk12 days ago
    by default containers do not limit core count, you'll get all available on the host/VM.
    these cores are shared with all the other containers, could be hundreds more
  - tintor12 days ago
    Cores are shared with other containers.
nathan_compton11 days ago
Fun to play with all this stuff before the bad actors make it dangerously useless.
trolleski11 days ago
Wow, it can do what I could do 20 years back using Ctrl+T? The progress! Give them another 10 billion, scratch that, 20 billion, scratch that, 75 trillion. - Written by SarcastAI.
pplonski8611 days ago
thank you for sharing, is there a new container for each code run, or it stays the same for whole conversation?
- aryehof11 days ago
  It’s maintained for the conversation. You can ask it for details like this.
LowLevelKernel12 days ago
Isn’t that ChatGPT’s internal MCP tools?
- simonw12 days ago
  It's one of the tools that are available to ChatGPT - they're not MCP tools because ChatGPT's implementation of tools pre-dates MCP, but they work effectively the same way.
  Here's a full list which looks accurate to me: https://chatgpt.com/share/6977ffa0-df14-8006-9647-2b8c90ccbb...
CSMastermind12 days ago
Thank God, this was extremely annoying
syngrog6611 days ago
ahhh... yet more things I've been able to do for decades already
jacquesm12 days ago
How long before they'll be mining crypto?
- simianwords12 days ago
  why would they do that?
  - bandrami12 days ago
    Because the injected malicious prompt told them to
  - streptomycin12 days ago
    instrumental convergence
    simianwords12 days ago
    what?
blobbers12 days ago
Did I miss the boat on chatgpt? Is there something more to it than the web chat interface?
I jumped on the Claude Code bandwagon and I dropped off chatgpt.
I find the chatgpt voice interface to be infuriating; it literally talks in circles and just spews summary garbage whenever I ask it anything remotely specific.
- simonw12 days ago
  I still like ChatGPT for search more than Claude, though I think Claude may be catching up now. Gemini is getting good at search too (as you'd hope it would!)
- ArtemGetman8 days ago
  same experience - voice mode is dumbed down compared to text. ended up building my own voice interface that uses full claude/gpt/gemini models instead of the lobotomized voice versions. actually handles specific requests without the "go look it up yourself" cop-out. want to try it?
- aryehof11 days ago
  Chatgpt recently added additional personalization options that have made their voice chat better for me. I want a direct professional, no “hey” there I’m your bro fake stuff etc. See personalization under settings.
  - blobbers11 days ago
    Okay, I'll try that out. I was asking it to do something like summarize a balance sheet over a few years and while the chat interface will do this, the voice interface would just tell me to go look up the specific data source, it refused to barf out numbers.
- fragmede12 days ago
  codex ~= Claude code
nottorp12 days ago
... as root?
- tintor12 days ago
  No root. `pip` and `npm install` don't require it.
  You can not use `sudo apt install` inside it.
  They use gVisor, and other container isolation mechanisms: https://ryan.govost.es/2025/openai-code-interpreter/
  - bandrami12 days ago
    OTOH if you have apt, you have arbitrary shell commands (hooray dpkg-hooks!)
    Golden years for cybersecurity people
- zahlman12 days ago
  Given that it's within a container on a remote server, does that matter?
  - acedTrex12 days ago
    I mean i hope its more hardened than JUST a container given how many container escapes there are.
    jchw12 days ago
    Apparently, they are using gVisor, which when applied properly, should make a pretty good isolation primitive.
bandrami12 days ago
As an infosec guy I'm going to go ahead and buy a bigger house
- rvz12 days ago
  Well either way, the infosec folks are going to have the time of their lives printing write-ups and lots of money on both sides.
  I can see the sandbox escapes, remote code exection paths, exfiltration methods and all the vibe coded sandcastles waiting to be knocked down because we have folks openly admiting that do not know a single line of code they are prompting to the AI.
  I don't think we know the scale of the amout of security issues we will see because of the level of hubris there is with AI taking care of all of the coding.
- jesterson11 days ago
  Any IT guy with experience/knowledge above average should take out huge loan as well.
  Someone will have to clean the mess made by those creators who think they can "create" anything reliable with their chatgpt
- usefulposter11 days ago
  Hmm...
  https://news.ycombinator.com/item?id=44367548
- giancarlostoro12 days ago
  How about Six PS6's
bofadeez11 days ago
[flagged]
- dang11 days ago
  Could you please stop posting unsubstantive comments and flamebait? You've unfortunately been doing it repeatedly. It's not what this site is for, and destroys what it is for.
  If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.
  - lesser-shadow11 days ago
    Could you please answer the email please? Thanks.
    dang11 days ago
    We don't answer aggressive or abusive emails. If you genuinely want an answer, it's easy enough to ask your question respectfully.
    bofadeez11 days ago
    [flagged]
    lesser-shadow11 days ago
    The insults were justified given that you ignored my emails until I resorted to spamming your inbox twelve hours in.
    Here's my email because I have nothing to hide:
    >Hey, Could you clarify why did you shadow ban my account, or am I just breaking your circlejerk by posting opinions your mods disagree with? Also how are my posts related to IC design flagged as dead?Literally every other comment that is slightly political being removed I'd understand but apparently your moderators are just mentally insane. Can you also explain why you harbor AI-made garbage on site? Doesn't help the website's "Quality".
shevy-java12 days ago
And so it begins - Skynet 3.0.