7 pointsby niklio7 hours ago4 comments

niklio7 hours ago
The metric used is per-word surprisal: -logprob of each word you type. This is just the same thing as per-word cross entropy or KL-divergence where the user distribution is one-hot. Calibrating it so text generated by frontier models scored poorly was a challenge at first. Originally ChatGPT was scoring around 54%. I'm still having trouble assigning high scores to the personalized Gemini and ChatGPT responses when I'm logged in because all my personal context gives surprising responses.
And yes, gibberish responses score very human :)
TheJCDenton5 hours ago
Funny little game, would be even funnier to have a system to roast the prose of a friend on social media or even a screenshot
- niklio5 hours ago
  Thanks! That's a great idea - i'll top up my fable budget and get started :)
sightspinneran hour ago
[dead]
priyankarr7 hours ago
[flagged]