10 pointsby peter944 hours ago1 comment

enjoykaz4 hours ago
100% precision on the Claude Code run seems undersold. If you're building a systematic review, a false match (claiming paper X studied trial Y when it didn't) could corrupt your conclusions in ways a miss wouldn't. Would be curious if the authors have a domain reason for weighting recall over precision here.
- peter942 hours ago
  Personally, I think Claude Code played it a little too safe here, so that's why we didn't put more emphasis on its precision. Note that 100% precision is also easy to achieve in this case: Only match trials with papers that explicitly mention said trials via regex. So clearly we have to pay attention to both precision and recall. We just happened to go with F1 as the more or less canonical measure to take both into account, but I agree that, depending on your use case, you may be interested in other measures of accuracy.