Author here, happy to answer any questions about this or chat about the ideas behind it :)
(Though I'm not sure I'll get on the expedition, I am a little worried about sandboxing and setup and getting distracted...)
If I was to start the expedition, I'd probably try to overshoot by describing a site that I could not myself fully imagine, or using attributes that lacked a single meaning. Like, "the artist's interactive portfolio, as though the artist is looking over your shoulder, the artist keeping a carefully neutral expression while seething inside." Then I'd probably continue, imagining just the outline of some site that satisfies some unarticulated desire, putzing around as I see a concrete articulation of that idea, as much reforming the idea in my head in response to those results to an equal degree that I am articulating the idea in more detail.
When I broke out the layout and style components I was thinking of being able to change the whole site aesthetic from something like "standard b2b" to "geocities fan page", but I'm excited to try getting fuzzier with the descriptions!
You can imagine given 1,000,000 page views just how many experiments could be run. Basically our A/B tests start to resemble natural evolution and survival of the fittest more than decision trees.
However, something feels like it’s missing. I wonder what’s still yet to be built before we arrive at that future.
That's a really cool idea that once you can get something somewhat reliably consistent generated, you can kind of let your A/B tests start to run themselves with just rough guidelines on what you're trying to optimize for...
You’d want the A and B to be intentional, not automatically generated. Every VP thinks their idea for a feature will revolutionize the company.
Now imagine that everyone of them is given a tool that could get them an POC quickly. I think a lot VPs are about to figure out that their ideas are shit.
Instead, after reading the page, it is LLM generated pages where "you get what you ask for," hallucinations and all. Fantastic name.
Reminds me of Cucumber testing framework
But LLMs can make sense of any ol' thing, so, and it shocks me to admit such, maybe Gherkin is back on the menu.
1.LLM "code" , this should work for most basic use cases. Should be so basic any random person can create a CRUD app.
2. Scripting, something like Python. This should handle 95% of use cases.
3. Systems programing. Zig, Rust, etc. For when you need extremely specific performance requirements to be met.
My dream language would integrate all three of these in the same stack, ideally the same project would be a mix of all three ( most of the time a mix of the first two).
It works somewhat but even with the smaller/faster models it's very slow and even with the big models it is pretty unreliable. Long term I can definitely imagine this will get more viable and maybe become a complement to the 'chat' interface with most SaaS apps essentially being replaced with a AI in front of system or systems of record.