14 pointsby jbmilgrom3 hours ago4 comments
  • isodevan hour ago
    > Neural networks excel at judgment

    I don’t think they do. I think they excel at outputting echoes of their training data that best fit (rhyme with, contextually) the prompt they were given. If you try using Claude with an obscure language or use case, you will notice that effect even more - it will keep pulling towards things it knows that aren’t at all what’s asked or “the best judgement” for what’s needed.

    • 37 minutes ago
      undefined
    • rybosworld31 minutes ago
      Neural nets have been better at classifying handwriting (MNIST) than the best humans for a long time. This is what the author means by judgement.

      They are super-human in their ability to classify.

      • verdverm15 minutes ago
        Classifiers and LLMs get very different training and objectives, it's a mistake to draw inference from MNIST for coding agents or LLMs more generally.

        Even within coding, their capability varies widely between context and even runs with the same context. They are not better at judgement in coding for all cases, def not

  • verdverm2 hours ago
    Lost me at the claim AI is good at judgement making, this is the exact opposite of my experience, they make both good and bad decisions with reliability
    • rybosworld39 minutes ago
      I think that's also true of people but we are kinder to each other and ourselves when judgement is bad.

      How many times have you been in a conversation where you asked the wrong question or stated the wrong thing because you either weren't 100% listening (no one is), or you forgot, or you didn't connect the same dots that others did?

      • Terr_31 minutes ago
        Treating humans differently makes sense because the "badness" of a judgement isn't just the correctness of an outcome, but also the nature of the process that created it, and humans are a different process.

        For example, if two humans get the same correct answer, we naturally favor the one that reached it through facts and reasoning as opposed to the one that literally flipped a coin.

    • mvcan hour ago
      I think it makes better decisions than me provided I give it enough high-level direction and context.

      Sometimes I give it __too much__ direction and it finds the solution I had in mind but not the best.

      I'm not into it enough that I'm formally running different personas against each other in a co-operative system but I kind of informally do that.

      • verdverm18 minutes ago
        The type of decision very much matters, coding is one thing. I met a chap at the bar who ChatGPT had verified his crazy theories and he now outsourced all of his major life decisions to it, very proud and enthusiastic about it all. First IRL case of AI psychosis I have encountered. He was keen for my thoughts, as though I was the first person he met IRL that knew more than the layman about Ai. Hope the questions (contradictions) I left him with helped bring him back a bit.

        It's going to get a lot worse

  • wrs2 hours ago
    In other words, a higher-level JIT compiler, meaning it still dynamically generates code based on runtime observations, but the code is in a higher-level language than assembly, and the observations are of a higher-level context than just runtime data types.
    • daxfohl17 minutes ago
      I agree this is what the article says, but it's a pretty bad premise. That would only be the case if the primary user interaction with coding agents was "feed in requirements, get a finished product". But we all know it's a more iterative process than that.
      • jbmilgrom11 minutes ago
        Author here

        We are building this at docflowlabs ie a self-healing system that can respond to customer feedback automatically. And youre right that not all customers know what they want or even how to express it when they do, which is why the agent loop we have facing them is way more discovery-focused than the internal one.

        And we currently still have humans in the loop for everything (for now!) - e.g, the agent does not move onto implementation until the root cause has been approved

  • 2001zhaozhaoan hour ago
    > Code is the policy, deployment is the episode, and the bug report is the reward signal

    This is a great quote. I think it makes a ton of sense to view a sufficiently-cheap-and-automated agentic SWE system as a machine learning system rather than traditional coding.

    * Perhaps the key to transparent/interpretable ML is to just replace the ML model with AI-coded traditional software and decision trees. This way it's still fully autonomously trained but you can easily look at the code to see what is going on.

    * I also wonder whether you can use fully-automated agentic SWE/data science in adversarial use-cases where you traditionally have to use ML, such as online moderation. You could set a clear goal to cut down on any undesired content while minimizing false-positives, and the agent would be able to create a self-updating implementation that dynamically responds to adversarial changes. I'm most familiar with video game anti-cheat where I think something like this is very likely possible.

    * Perhaps you can use a fully-automated SWE loop, constrained in some way, to develop game enemies and AI opponents which currently requires gruesome amounts of manual work to implement. Those are typically too complex to tackle using traditional ML and you can't naively use RL because the enemies are supposed to be immersive rather than being the best at playing the game by gaming the mechanics. Maybe with a player controller SDK and enough instructions (and live player feedback?), you can get an agent to make a programmatic game AI for you and automatically refine it to be better.

    • Terr_18 minutes ago
      > Perhaps the key to transparent/interpretable ML is to just replace the ML model with AI-coded traditional software and decision trees. This way it's still fully autonomously trained but you can easily look at the code to see what is going on.

      Just yesterday I came across a something a sci-fi webcomic author wrote as backstory back in ~2017, where all future AI has auditable logic-chains, due to a disaster in 2061 involving an American AI defense system.

      While the overall concept of "turns on its creators" is not new, I still found the "root cause" darkly amusing:

      > [...] until the millisecond that Gordon Smith put his hand on a Bible and swore to defend the constitution.

      > Thus, when the POTUS changed from Vanderbilt to Smith, a switch flipped. TIARA [Threat Intel Analysis and Response Algorithm] was now aware of an individual with 1) a common surname, 2) a lot of money and resources, 3) the allegiance of thousands of armed soldiers, 4) many alternate aliases (like "POTUS"), 5) frequent travel, 6) bases of operation around the world, 7) mentioned frequently in terrorist chatter, etc, etc, etc.

      > And yes, of course, when TIARA launches a drone strike, it notifies a human operator, who can immediately countermand it. This is, unfortunately, not useful when the drone strike mission has a travel time of zero seconds.

      > Thousands of intelligent weapons, finding themselves right on top of a known terrorist's assets, immediately did their job and detonated. In less than fifteen minutes, over ten thousand people lost their lives, and the damage was estimated in the trillions of dollars.

      [0] https://forwardcomic.com/archive.php?num=200

    • jbmilgrom42 minutes ago
      > Perhaps the key to transparent/interpretable ML is to just replace the ML model with AI-coded traditional software and decision trees. This way it's still fully autonomously trained but you can easily look at the code to see what is going on.

      For certain problems I think thats completely right. We still are not going to want that of course for classic ML domains like vision and now coding, etc. But for those domains where software substrate is appropriate, software has a huge interpretability and operability advantage over ML