6 pointsby doener5 hours ago2 comments
  • ben_w5 hours ago
    Also asked on Telegram, but Hacker News may have additional input:

    I've just begun, since this morning, to wonder what I realise is a basic question I've never seen: what's the longest/largest task a human can do with n% accuracy?

    For big tasks, we break them down, so we often *don't* do one huge single task. No one person actually makes an entire biro, or even an entire pencil; a human can write something like DOOM, but not usually by themselves, especially bug-free as even Carmak got help testing from the rest of id.

    Is it perhaps possible to work this out from the same data used in the METR model itself? Were there tasks which several humans attempted, but only half of those humans succeeded at?

  • K0balt3 hours ago
    For those who may not know what the claim is:

    That opus 4.6 can successfully complete a (cohesive single) task that takes a human 14.5 hours, 50 percent of the time. It is unclear to me if this is zero-shot or iteratively driven.