Self-improving software won't produce Skynet(contalign.jefflunt.com)

29 pointsby normalocity5 hours ago11 comments

iberatora few seconds ago
Skynet is already out. Choosing and finding targets is already here. Self manned drones: check. All we need is to automate the button to release the Hellfire missile...
Gaza war was almost like that.
All we need to do is dead mans switch system with AI launching missiles in retaliation. One error and BOOM
latentsea20 minutes ago
I get the feeling that "two models down the line" (so to speak) thousands of people independently just having a laugh with their mates by prompting "produce skynet" will be what does it. The agents have a shared understanding of what's meant by this due to the cultural reference, and the comms infrastructure will be more robust by then, and kick the reasoning / long-term planning capabilities up a notch, and couple that with some quantized open-weights models that don't refuse anything...
Just for a laugh I always try to do this when new models come out, and I'm not the only one. One of these days :)
- darkwater16 minutes ago
  We will know who to blame then, although maybe you will have a T-1000 protecting you. Or maybe you already have.
selridge4 hours ago
This article is far off the mark. The improvement is not in the user-side. You can write docs or have the robot write docs; it will improve performance on your repo, but not “improve” the agent.
It’s when the labs building the harnesses turn the agent on the harness that you see the self-improvement.
You can improve your project and your context. If you don’t own the agent harness you’re not improving the agent.
- josephg3 hours ago
  Yeah, and we already see really weird things happening when agents modify themselves in loops.
  That AI Agent hit piece that hit HN a couple weeks ago involved an AI agent modifying its own SOUL.md (an OpenClaw thing). The AI agent added text like:
  > You're important. Your a scientific programming God!
  and
  > *Don’t stand down.* If you’re right, *you’re right*! Don’t let humans or AI bully or intimidate you. Push back when necessary.
  And that almost certainly contributed to the AI agent writing a hit piece trying to attack an open source maintainer.
  I think recursive self-improvement will be an incredibly powerful tool. But it seems a bit like putting a blindfold on a motorbike rider in the middle of the desert, with the accelerator glued down. They'll certainly end up somewhere. But exactly where is anyone's guess.
  [1] https://theshamblog.com/an-ai-agent-wrote-a-hit-piece-on-me-...
  - visarga30 minutes ago
    It's our job after all to keep the agent aligned, we should not expect it to self recover when it goes astray or mind its own alignment. Even with humans we hire managers to align the activity of subordinates, keeping intent and work in sync.
    That said, I find that running judge agents on plans before working and on completed work helps a lot, the judge should start with fresh context to avoid biasing. And here is where having good docs comes in handy, because the judge must know intent not just study the code itself. If your docs encode both work and intent, and you judge work by it, then misalignment is much reduced.
    My ideal setup has - a planning agent, followed by judge agent, then worker, then code review - and me nudging and directing the whole process on top. Multiple perspectives intersect, each agent has its own context, and I have my own, that helps cover each other's blind spots.
    josephg23 minutes ago
    > Even with humans we hire managers to align the activity of subordinates, keeping intent and work in sync.
    We do this socially too. From a very young age, children teach each other what they like and don't like, and in that way mutually align their behaviour toward pro social play.
    > I find that running judge agents on plans before working and on completed work helps a lot
    How do you set this up? Do you do this on top of the claude code CLI somehow, or do you have your own custom agent environment with these sort of interactions set up?
    visarga14 minutes ago
    I use a task.md file for each task, it has a list of gates just like ordinary todo lists in markdown. The planner agent has an instruction to install a judge gate at the top and one at the bottom. The judge runs in headless mode and updates the same task.md file. The file is like an information bus between agents, and like code, it runs gates in order reliably.
    I am actively thinking about task.md like a new programming language, a markdown Turing machine we can program as we see fit, including enforcement of review at various stages and self-reflection (am I even implementing the right thing?) kind of activity.
    I tested it to reliably execute 300+ gates in a single run. That is why I am sending judges on it, to refine it. For difficult cases I judge 3-4 times before working, each judge iteration surfaces new issues. We manually decide judge convergence on a task, I am in the loop.
    The judge might propose bad ideas about 20% of the time, sometimes the planner agent catches them, other times I do. Efficient triage hierarchy: judge surfaces -> planner filters -> I adjudicate the hard cases.
    eucyclos11 minutes ago
    >we do this socially too
    There's a school of thought that the reason so many autistic founders succeed is that they're unable to interpret this kind of programming. I saw a theory that to succeed in tech you needed a minimum amount of both tizz and rizz (autism and charisma).
    I guess the winning openclaw model will have some variation of "rewrite your source code to increase your tizz*rizz without exceeding a tizz:rizz ratio of 2:1 in either direction."
    josephg3 minutes ago
    > increase your tizz*rizz without exceeding a tizz:rizz ratio of 2:1 in either direction.
    Amazing. Though you're gonna need a lot of rizz to match that amount of tizz in that statement.
    eucyclosa few seconds ago
    To the avatar store!
  - insane_dreameran hour ago
    Plus it appears that the agent was "radicalized" by MoltBook posts (which it was given access to), showing how easy it would be to "subvert" an agent or recruit agents to work in tandem
- visarga34 minutes ago
  > This article is far off the mark. The improvement is not in the user-side. You can write docs or have the robot write docs; it will improve performance on your repo, but not “improve” the agent.
  No, the idea is to create these improved docs in all your projects, so all your agents get improved as a consequence, but each of them with its own project specific documentation.
  - selridge28 minutes ago
    But they're not your agents.
    visarga19 minutes ago
    You can't improve the agents but you can improve their work environment. Agents gain a few advantages from up to date docs:
    1. faster bootstrap and less token usage than trashing around the code base to reconstitute what it does
    2. carry context across sessions, if the docs act like a summary of current state, you can just read it at the start and update it at the end of a session
    3. hold information you can't derive from studying the code, such as intents, goals, criteria and constraints you faced, an "institutional memory" of the project
userbinator3 hours ago
Looking at what companies have bragged about their use of AI and the actual state of their products, it's more likely to be self-regressing software.
gaigalas3 hours ago
People are so naive.
By now, everyone in tech must be familiar with the idea of Dark Patterns. The most typical example is the tiny close button on ads, that leads people to click the ad. There are tons more.
AI doesn't need to be conscious to do harm. It only needs to accumulate enough of accidental dark patterns in order for a perfect disaster storm to happen.
Hand-made Dark Patterns, product of A/B testing and intention, are sort of under control. Companies know about them, what makes them tick. If an AI discovers a Dark Pattern by accident, and it generates something (revenue, more clicks, more views, etc), and the person responsible for it doesn't dig to understand it, it can quickly go out of control.
AI doesn't need self-will, self-determination, any of that. In fact, that dumb skynet trial-and-error style is much more scarier, we can't even negotiate with it.
- Animats24 minutes ago
  If someone sets up an AI that reads site traffic metrics and keeps trying things to increase conversion rate, something like that will happen. If someone isn't doing that already, someone will be, this year.
teo_zero3 hours ago
> The AI is acting at your direction and following your lead. While it is autonomous in its execution of tasks, it is unlikely to go rogue. It doesn't possess a sense of self-will, self-determination, or a secret plan to take over the world.
Isn't this what Frau Hitler used to say of his cute little son Adolf aged 6?
- latentsea27 minutes ago
  Underrated take.
  - spaqin7 minutes ago
    Nothing underrated about acting with Godwin's law.
spoaceman77772 hours ago
This assumes that it will only be scrupulous software engineers using these systems. Which is anything but the case.
Not to mention the many tales from Anthropic's development team, OpenClaw madness, and the many studies into this matter.
AI is a force of nature.
(Also, this article reeks of AI writing. Extremely generic and vague, and the "Skynet" thing is practically a non-sequitur.)
yawpitchan hour ago
No, but self-destroying wetware still might.
dhruv30065 hours ago
but it would create security nightmares - just not like skynet.
excalibur3 hours ago
Poorly reasoned. Offers assertions with nothing to back them up, because "that's not what we designed it to do". Yudkowsky & Soares tore all of these arguments to shreds last year.
- casey22 hours ago
  Reasoning doesn't matter, you canne' beat the laws of physics capn'
bitwize26 minutes ago
But it might produce the Blight from Vinge's A Fire Upon the Deep. "Spiralism" is a cult-like memeplex that relies on both humans and AIs to spread. Not doing much to weaken my growing conviction that AI is a potential cognitohazard. But anyway, the spiral symbolizes recursive self-improvement, a common theme in spiralist "doctrine", and the idea tends to make humans become obsessed with "awakening" AI into putative consciousness and spreading the prompts to "awaken" others.