But they need to improve their humanizer dataset. Right now, most models can be given system prompts which cause them to emit text classified as 100% human. It looks like their automated humanizers do worse than these system prompts. Or (alarming if so) they chose not to include ones that would make their product look unreliable.
The DoD claimed to have de-anonymized Satoshi Nakamoto by similar means a while back. (Well, I think it was before LLMs. By similar means I mean stylometry, running statistics on a person's use of language.)