https://blog.nawaz.org/posts/2025/Oct/pelican-on-a-bike-rayt...
Opus4.1: https://claude.ai/public/artifacts/b47c2dd5-41a6-452c-8701-5...
Sonnet 4.5: https://gemini.google.com/share/a8ebea2c31dd
Gemini 2.5pro: https://gemini.google.com/share/a8ebea2c31dd
“a pelican riding on a bicycle in 3d. Works for mobile“
Here's the codex-mini attempt: https://static.simonwillison.net/static/2025/povray-pelican-...
People are delegating way too much to LLMs. In turn, this makes your own research or problem-solving skills less sharp.
> This is a useful starting point for a project like this—in figuring out the compile step the coding agent gets seeded with a little bit of relevant information about the project, and if it can compile that means it can later partially test the code it is writing while it works.
"Figure out how to build this" is a shortcut for getting a coding agent primed for future work. If you look at the transcript you can see what it did: https://gistpreview.github.io/?ddabbff092bdd658e06d8a2e8f142...
Ran ls
└ AGENTS.md
CHANGELOG.md
+13 lines
scripts
sdk
Ran cat README.md
└ <p align="center"><code>npm i -g @openai/codex</code><br />or <code>brew
install --cask codex</code></p>
+105 lines
This repository is licensed under the [Apache-2.0 License](LICENSE).
Ran rg -n "rust" -n docs/install.md
└ 23:curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
25:rustup component add rustfmt
26:rustup component add clippy
Ran sed -n '1,200p' docs/install.md
└ ## Install & build
+36 lines
cargo test
```
Im gathering the build steps for the Rust tool, focusing on running cargo
build inside the codex-rs directory. I'll check dependencies and confirm
the instructions before actually building, aiming to provide a clear summary
afterward.
Ran cargo build
└ Updating git repository `https://github.com/nornagon/crossterm`
Updating git repository `https://github.com/nornagon/ratatui`
+825 lines
Compiling codex-cli v0.0.0 (/Users/simon/Dropbox/dev/codex/codex-rs/cli)
Finished `dev` profile [unoptimized + debuginfo] target(s) in 1m 27s
Ran ls target/debug | head
└ apply_patch
apply_patch.d
+6 lines
codex-execpolicy
That's a decent starting point on seeding the context with information that's relevant to making and then testing the modifications I'm about to ask for.It now knows what the project is, what dependencies it uses, how it's laid out and the set of binaries that it generates.
Even more importantly: it knows that the project can be built without errors. If it tries a build later and sees an error it will know that the error was caused by code it had modified.
But Simon isn’t a Rust developer - he’s a motivated individual with a side project. He can now speedrun the part he’s not interested in. That doesn’t affect anyone else’s decisions, you can still choose to learn the details. Ability to skip it if you wish, is a huge win for everyone.
The reductio that people tend to be concerned about is, what if someone is not interested in any aspect of software development, and just wants to earn money by doing it? The belief is that the consequences then start becoming more problematic.
Some people will always look for ways to "cheat". I don't want to hold back everyone else just because a few people will harm themselves by using this stuff as a replacement for learning and developing themselves.
This new post gets at the issue: https://news.ycombinator.com/item?id=45868271
I agree that people using LLMs in a lazy way that has negative consequences - like posting slop on social media - is bad.
What's not clearly to me is the scale of the problem. Is it 1/100 people who do this, or is it more like 1/4?
Just a few people behaving badly on social media can be viewed by thousands or even millions more.
Does that mean we should discard the entire technology, or should we focus on teaching people how to use it more positively, or should we regulate its use?
In this case its more like slowrunning. Building rust project is 1 command and chatgpt will tell you this command in 5 seconds.
Running an agent for that is 1000x more inefficient.
At this point its not optimizing or speeding things up but running agent for the sake of running agent.
Across a day of doing these little “run one command” tasks, even getting blocked by one could waste an hour. That makes the expected value calculation of each single task tilt much more in favor of a hands off approach.
Secondly, you’re not valuing the ability to take yourself out of the loop - especially when the task to be done by AI isn’t on the critical path, so it doesn’t matter if it takes 5 minutes or 5 milliseconds. Let AI run a few short commands while you go do something else that’ll definitely take longer than the difference - maybe a code review - and you’ve doubled your parallelism.
These examples are situational and up to the individual to choose how they operate, and they don’t affect you or your decisions.
It would be interesting to know what kinds of responses humans offer across different values of Y such as:
1) looked on stack overflow 2) googled it 3) consulted the manual 4) asked an LLM 5) asked a friend
For each of these, does the learner somehow learn something more or better?
Is there some means of learning that doesn't degrade us as human beings according to those in the know?
I ask as someone who listens to audiobooks and answers yes when someone asks me if I've read the book. And that's hardly the extent of my transgressions.
It's worth learning how to do this stuff. Not just because you then know that particular build system, but because you get better at learning. Learning how to learn is super important. I haven't come across a new project that's taken me more than a few minutes to figure out how to build in years.
This isn't even close to true. The majority of programmers will be fine going their entire career without even knowing what Rust is, let alone how to build Rust projects.
A more accurate analogy would be a plumber not knowing how his wrench was manufactured.
(Though in general I do agree with “it’s worth learning how to do this stuff.)
The supposed ubiquity of Rust is the result of a hype and/or drama bubble.
Regarding your second point, I think people actually underutilise LLMs for simple tasks. Delegating these tasks frees up your problem-solving skills for challenges that truly need human insight. In this case, asking an LLM is arguably the smart choice: it's a common task in training data, easy to verify, and low-risk to run and not a direct learning or benefit for your initial question.
Why would that be true? The average assistant is certainly typing more quickly than their boss, but most people would not find issue in that. It's different responsibilities. You free up time to research / problem-solve other things.
> No need to wait for 5-30 minutes until LLM figures this out.
I don't care it the LLM takes 15 additional minutes to figure it out, if it net saves me a minute (we could certainly debate the ergonomics of the multitasking involved, but that is something that every person, who delegates work, has to deal with and that's not unique to working with LLMs in any way)
Instead they let you type vague or ambiguous crap in and just essentially guess about the unclear bits. Hadn't quite thought through which algorithm to use? No worries, the LLM will just pick one. Hadn't considered an edge case? No worries, the LLM will just write 100 lines of code that no sane programmer would ever go through with before realising something isn't right.
I've made the mistake of being that senior who is way too eager to help juniors many times in my career. What happens is they never, ever learn for themselves. They never learn how to digest and work through a problem. They never learn from their mistakes. Because they can always just defer to me. LLMs are the worst thing to happen for these people because unlike a real person like me the LLM is never busy, never grumpy and nobody is taking notes of just how many times they're having to ask.
LLMs are really useful at generating boilerplate, but you have to ask yourself why you're spending your days writing boilerplate in the first place. The danger is it can very quickly become more than just boilerplate and before you know it you've forgotten how to think for yourself.
Becoming proficient at banging out Home Assistant entities and their utterly ludicrous instantiation process has zero value for my career.
It's not that I don't care about learning how to build Rust or think that it's too big of a challenge. It's just not the thing I was excited about right now, and it's not obvious ahead of time how sidetracked it will get me. I find that having an LLM just figure it out helps me to not lose momentum.
I might learn Rust some day. At the moment, I don't need the mental clutter.
It's my understanding that building Rust applications still requires a C toolchain, and packages are still going to be dependent on things like having the openssl dev headers/libraries installed. That's fine, that's normal for building software, but it's not as trivial as "just git-clone this Rust source repo and run one command and everything will work".
I'm certain I could get up and running quickly. I'm also certain I'd have to install a bunch of stuff and iterate past multiple roadblocks before I was actually able to build a Rust application. And finally I'm certain I could get Claude or Codex to do it all a lot faster than if I muddled through it myself for half an hour.
Then cd dir && cargo run
I get what you’re saying, but rust has really set the bar (lowered the bar?) for making it easy, so it’s a bad example to pick on.
It is easy for any of us to quickly bootstrap a new project in whatever language. But this takes a cognitive toll, and adds friction to bring our visions online.
Recently, I went "blind" for a couple of days. My vision was so damaged I could only see blurs. The circumstances of this blindness are irrelevant, but it dawned on me that if I were blind, I could no longer code as I do.
This realization led me to purchase a Max subscription to Claude Code and rely more on LLMs for building, not less.
It was much more effective than I thought it would be. In my blurred blindness, I saw blobs of a beautiful user interface form, from the skeleton of my Rust backend, Vue3 frontend. It took my (complex backend in Rust) and my frontend scaffolding to another level. I could recognize it via the blur distinctly. And it did this in minutes / hours instead of days.
As my vision returned, I began analyzing what happened and conducting experiments. My attitude changed completely. Instead of doing things myself, directly, I set out to make the LLM do it, even if it took more time.
It is painful at first. It makes very stupid mistakes that make an experienced engineer snarl at it, "I can do better myself". But my blindness gave me new sight. If I were blind, I couldn't do it myself. I would need help.
Instead of letting that ego take over, I started patiently understanding how the LLM best operates. I discovered mainly it needs context and specific instructions.
I experimented with a DSL I made for defining LLM instructions that are more suited for it, and I cannot describe the magic that started unfolding.
Now, I am writing a massive library of modular instructions for LLMs, and launching them against various situations. They will run for hours uninterrupted and end up implementing full code bases, with complete test suites, domain separation, and security baked in.
Reviewing their code, it looks better than 90% of what I see people producing. Clear separation of concerns, minimal code reuse, distinct interface definitions, and so much more.
So now, I have been making it more and more autonomous. It doesn't matter if I could bootstrap a project in 30 seconds. If I spend a few hours perfecting the instructions to the LLM, I can bootstrap ANY project for ANY LANGUAGE, forever.
And the great thing? I already know the pattern works. At this point, it is foolish for me to do anything other than this.
Source: Am a blind person coding for many years before language models existed.
That's one way to think about it, but on the other hand, where's the "skill" in knowing a particular CLI invocation for a particular tool or installation task? Next year there will be a Better Way to Do It. (Witness how many trendy package installers / venv managers the Python community has gone through.)
An LLM's job is to translate what I want to do into instructions or commands that actually do it. Real skill involves defining and directing the process; the implementation details are just temporary artifacts. Memorized command lines, mastery of specific tools, and conflation of rote procedures with "skills" are what keeps languages like C around for 50 years, long after the point where they begin to impede progress.
But now... I use a ton of advanced Git features several times a week, because just knowing that it's possible to do something is enough for me to tell Codex or Claude Code to do that thing.
So maybe Git mastery now is more about concepts? You need to understand the DAG and roughly what's in the .git folder and origins and branches and tags and commits and so forth, but you don't need to remember the syntax.
But if you aren't even issuing commands directly to Git, suddenly it starts to look like there is room for improvement without the pain of learning a new tool or a new paradigm. That's a bigger deal than I think most people appreciate.
https://news.ycombinator.com/item?id=45845717
https://gally.net/temp/20251107pelican-alternatives/index.ht...
There's some fun little ones in there. I've not idea what Llama 405B is doing. Qwen 30B A3B is the only one that cutely starts on the landscaping and background. Mistral Large & Nemo are just convinced that front shot is better than portrait. Also interesting to observe varying temperatures.
I feel like this SVG challenge is a pretty good threshold to meet before we start to get too impressed by ARC AGI wins.
[1] https://weval.org/analysis/visual__pelican/f141a8500de7f37f/...
It's a very bad threshold. The models write the plain SVG without looking at the final image. Humans would be awful at it and you would mistakenly conclude that they aren't general intelligences.
That was a bit dissapointing, because I feel the codex api with claude semantics would be really nice.
Translating all the tools calls when you cannot control the prompt seemed like a bit of a dead end though, so I eventually just switched back to claude (which incidentally allows any prompt you can dream up, but using the codex cli with claude was very much not on my wishlist)
Did you consider expanding the number of models by getting all calls through open router?