In what kind of harness/any at all? A model API call versus an agent would perform quite differently. People aren't thinking about regular-old ChatGPT taking jobs, they're thinking about Claude Code/Cowork.
https://github.com/JetXu-LLM/DocMason
Demo video: https://youtu.be/Sq3a5qxsLwM