We Are Changing Our Developer Productivity Experiment Design(metr.org)

17 pointsby ej883 hours ago3 comments

ej883 hours ago
Really interesting updates to their 2025 experiment.
Repeat devs from the original experiment went from 0-40% slowdown to now -10-40% speedup - and METR estimates this as a 'lower-bound'
more devs saying they dont even want to do 50% of their work without AI, even for 50/hr
30-50% of devs decided not to submit certain tasks without AI, missing the tasks with the highest uplift
it also seems like there is a skill gap - repeat devs from the first study are more productive with ai tools than newly recruited ones with variable experience
overall it seems like the high preference for devs to use AI is actually hurting METR's ability to judge their speedup, due to a refusal to do tasks without it. imo this is indirectly quite supportive for ai coding's productivity claims.
sgillen28 minutes ago
This is very interesting because I see a lot of AI detractors point to the original study as proof that AI is overhyped and nothing to worry about. In this new study the findings are essentially reversed (20% slowdown to 20% speedup).
softwaredoug2 hours ago
I'm a bit perplexed by the developer selection effects.
I get that developers want to use AI. But are they also claiming there's not still a no/low-AI population of developers? Or that their means of selection don't find these developers?
Are they worried that by splitting devs into groups of AI experience they might be measuring some confounder that causes people to choose AI / not AI in their careers?
- selridge30 minutes ago
  Here is my read:
  Developers are refusing to complete the survey or selecting themselves out because they (apparently) don’t want to complete the non-AI task.
  The also saw selection effects from a large reduction in the pay for the study (which is an unfortunate confounder here), 150/hr -> 50/hr.
  They guess this makes their estimates lower bounds, but the selection effect is complicated (which they acknowledge).
  Overall this is a hard problem for them in the current state. It will be challenging to produce convincing year over year analysis under these conditions.
- sgillen35 minutes ago
  The study was designed to have devs who are comfortable with AI perform 50% of tasks with AI and 50% without. So the problem is the population of "Developers who use AI regularly but are willing to do tasks without AI" is shrinking.
  >> Are they worried that by splitting devs into groups of AI experience they might be measuring some confounder that causes people to choose AI / not AI in their careers?
  The developer sample size was small (16 people in the original study) and the task sample size is larger (~250 tasks). I think the worry is variance in developer productivity would totally wash out any signal.