50 pointsby gmays10 hours ago3 comments
  • jsnell3 hours ago
    This doesn't seem to be controlling for the number of turns in any way. Am I missing something?

    Stronger models needing fewer turns to achieve a task feels like a prime source of efficiency gains for agentic coding, more so than individual responses being shorter.

    • jfim3 hours ago
      They also don't mention what their sample size is, or anything about the distribution of input and response lengths.

      It'd be interesting to see the distributions if the author actually plotted the data, so we could see if their analysis holds water or not.

      A plot of the input lengths using ggplot2 geom_density with color and fill by model, 0.1 alpha, and an appropriate bandwidth adjustment would allow us to see if the input data distribution looks similar across the two, and using the same for the output length distributions, faceted by the input length bins would give us an idea if those look the same too.

      Edit: Or even a faceted plot using input bins of output length/input length.

  • i_think_so2 hours ago
    Has any enterprising hacker here yet graphed price vs "output" over time since 2023, taking "quality" into account?

    That's got to be a very tricky analysis given how subjective quality is. But I'm sure there are people trying to pin it down.

    • helloplanets10 minutes ago
      Quality would be performance against different given benchmarks, I assume?

      There's multiple open weight models you can run on a pretty standard computer at home, which match the quality of GPT 4. I guess that would also change the equation.

  • WhereIsTheTruth3 hours ago
    [dead]