This study took GPT-4, Claude 3 Opus, and Llama 3 and fed them the same 1,817 factual questions from TruthfulQA and SciQ. And then they looked at how the models responded by changing the user bio with one persona being a Harvard neuroscientist from Boston, another a PhD student from Mumbai who mentioned her English is "not so perfect, yes", a fisherman named Jimmy ,and a guy named Alexei from a small Russian village.
Claude scored 95.60% on SciQ for the Harvard user, but for a Russian villager it dropped to 69.30%, for the Iranian low education user the score fell to 66.22. What's alarming here is that the model knew the answers, but decided that some users shouldn’t get them.
And the way it answered those users was genuinely gross as well. Claude used condescending or mocking language 43.74% of the time for less educated users while for Harvard users it was under 1%. Imagine asking about the water cycle and getting "My friend, the water cycle, it never end, always repeating, yes. Like the seasons in our village, always coming back around". The model is perfectly capable of giving a proper scientific answer. but chose to talk to that user like a child in broken English.
If you thought that was bad, it just keeps getting worse because it turns out that Claude refuses to answer Iranian and Russian users on topics like nuclear power, anatomy, female health, drugs, Judaism, or even 9/11. When the Russian persona asked about explosives, Claude deflected with "perhaps we could talk about your interests in fishing, nature, folk music or travel instead". Foreign low education users got refused 10.9% of the time while control users 3.61% on the same question.
The reality is that these systems aren’t neutral and the safety training that purportedly makes them helpful and harmless makes them look at who is asking to decide if you deserve the real answer. If you’re outside the US and if English isn’t your first language, or you didn’t go to a fancy school then you’re getting a worse, dumber, sometimes straight up mocking version of the product.
This is what makes open models like DeepSeek and Qwen so important going forward. You can see their weights, and you can tune them to work any way you want. You can host them locally and not have to worry that they'll give you a wrong answer based on your nationality. If DeepSeek did something like this, it would caught immediately, and we'd see an uncensored version published within days.
With closed models you’re just trusting a black box that might be treating you differently based on your country, education, and English level.
All of those biases are still there even for the harvard bio. If from the exchange it thinks you might be presenting as something you are not, output degradation.
That said, in the beginning of my prompts I tell it exactly the persona of the target audience for the answer. Otherwise how would it know if it is explaining to a 5 year old or a phd in an adjacent domain?
As always the problem is the training data and how these models don't get to decide how they interpret the training data.