18 pointsby sermakarevich5 hours ago3 comments
  • zihotki14 minutes ago
    Are there any benchmarks/evals to see if this particular one is doing anything good comparing to, let's say, plan mode? How do you measure it actually works and you don't waste tokens and your personal time?

    I fail to see any backing for claims 'boosting performance' and 'keeping costs low'

    • sermakarevich4 minutes ago
      fair

      here are slides explaining it in more details: https://docs.google.com/presentation/d/1SjKXF7hkoqyiN9-3tBGY...

      when plan + code mode works - no need to change it. when it does not, because feature is complicated - than we need something else. Thats when sdd is applicable. I use it for mid + size projects only.

      Measuring is a bit of subjective thing here. But when plan mode + code does not work and sdd works (because of double decomposition) - you get what you need.

      Tokens consumption is lower because you can wipe your context after every step or subtask implemented. The scope to deliver specs is bigger however. But confusion is way lower as your context is focused per single step or subtask.

  • aaronbrethorst3 hours ago
    I'd love to see a comparison with other spec-driven development tools for Claude, like OpenSpec and Superpowers. How does this compare and contrast with them?
  • siliconc0w3 hours ago
    I've been using agent flywheel workflow which is similar. Still not completely sold - it feels a bit like using power tools to shape wood but the final product needs a lot of sanding and polishing.

    I thought initially this meant that the spec wasn't detailed enough but the problem is more agent adherence and laziness.

    • dwb2 hours ago
      Exactly. A detailed-enough spec is just code that you can’t run. If models and agents got to a point where doing a good job in Claude Code plan mode meant that I didn’t have to keep an eye on them in implementation, then I would be interested in some bigger spec-driven thing like this. That is still far from the case today for me.