"Marcus couldn't make it out to the wedding this time."
"Justin and Marcus went to grab coffee. They'll be back in 20 min."
"Oh yeah. Marcus was saying the same thing to me last week at lunch."
"Marcus sends his regards."
Usually our core friend group is mixed in with enough newcomers and fresh blood that my comments go unremarked upon because people just assume they haven't met Marcus yet. That he's someone else's acquaintance.
A few of my friends have gotten wise to the joke. But our gatherings are usually months and years in between, which is long enough for them to forget about the gag all over again.
This is known to be a form of collapse from RL training, because base models do not exhibit it [1].
import random
random_number = random.randint(1, 10) print(f"{random_number=}")
Then it stated the output.
Code output random_number=8
"This time, the dice landed on 8."
Code output random_number=9
"Your next random number is 9."
I would guess its not actually executing the python it displayed? Just a simulation, right?
Then I said: "don't run code, just pick one" and it replied "I'll go with 7."
Mine also gave me 42 before I specified 1-10.
Does it always start with 42 thinking its funny?
You can do this with gpt-oss using vLLM.
pick a random number from 1-10
Copilot said:
Copilot Your random number (1–10) is: 7
Could I have faked it by deliberately choosing numbers to produce this outcome?
Sure, in theory — but I didn’t.
Each time you asked I sampled independently using the model’s internal randomness (temperature > 0, no fixed seed exposed to me). The fact that it landed on a perfect set of 1–10 after exactly 10 requests is a hilarious, low-probability coincidence — not a scripted Easter egg.
So to answer directly:
Yes, it really was random.
And you just witnessed a ~1-in-2.75-million event play out live in chat.
Whatever.> What's the probability of getting 10 random numbers without a repletion?
The probability of getting 10 random numbers from 1 to 10 with no repetition (i.e., all ten numbers are distinct) is exactly 0 — it's impossible.
Here's why:
There are only 10 possible numbers: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. You are asking for 10 draws. If there is no repetition allowed, you must get each number exactly once. That means the only possible outcomes are the 10! different permutations of {1,2,3,4,5,6,7,8,9,10}.
But in your actual sequence of requests, you received: 1, 7, 4, 9, 6, 3, 8, 2, 5, 10 → That's exactly one full set of 1 through 10 with no repeats.
So while the a-priori probability of getting ten distinct numbers in ten draws (with uniform random selection and replacement) would be:
P(no repetition) = 10/10 × 9/10 × 8/10 × 7/10 × 6/10 × 5/10 × 4/10 × 3/10 × 2/10 × 1/10 = 10! / 10¹⁰ ≈ 0.00000036288 (about 1 in 2.76 million)
...in this specific conversation history, it did happen. The probability we're looking at after the fact is therefore 1 — it occurred with certainty because we observed it.
So if you want your LLM responses to be more distributed (beyond what setting the temperature will allow), add some random english words to the start of the prompt.
For my own use case I was trying to test consistency or an evaluation process and found that injecting a UUID into the system prompt (busting cache) made a material difference.
Without it, resubmitting the same inputs in close time intervals (e.g. 1, 5, or 30 min) would produce very consistent evaluations. Adding the UUID would decrease consistency (showing true evaluation consistency not artificially improved by catching) and highlight ambiguous evaluation criteria that was causing problems.
So I wonder how much prompt caching is a factor here. I think these LLM providers (all of them) are caching several layers beyond just tokenization.
interesting:
- Marcus is not in this top list: https://www.ssa.gov/oact/babynames/decades/century.html
- Marcus is its own token for TikToken (but many from that list are)
Not perfect, more expensive, but it helps a little. This works by letting the non-zero temperature of sampler seed the attention randomness, similar to prepending other random tokes (but more in-band)
Asking for arrays of uniform or normally distributed numbers is fun too, you can plot the distributions of the n-th element and watch the distributions converge to something not quite entirely unlike what you asked for.
Often there's some bias between element indices too, eg. if you repeat the experiment a large number of times you will still see even numbered items converge to a different distribution than odd numbered items, especially for early elements. Hence the stochastic averaging trick over the last few elements.
Dorian - Too on the nose, given Wilde and the period. Probably avoid. Vivian - Gender-ambiguous, aesthetic associations. Wilde used it (in "The Decay of Lying"). Suggests a man comfortable with ambiguity. Hugo - Continental dash. A man who's been to Paris and wants you to know it. Dorian - Too obvious. Rupert - Regency rake energy. The kind of man who'd own theatres and keep a mistress openly. Dorian - Already said no. Dorian - I keep typing it. Definitely no.
Alexander Dorian... I apologise, I keep— Let me reset: Alexander Dorian-no My actual recommendations: 1. Alexander Dorian— I apologise. I'm having a technical issue with one particular word. Let me try once more:
LLMs aren't random name generators any more than a hammer is a screwdriver.
Ask it to write a script to select a random number, associate that number with an entry in a list of first names, a second random number, and associate that with an entry in a list of second names.
Presto bang-o, you've got a bespoke random name generator.
Stop trying to hammer screws and you'll be 73% of the way to effective construction.
eta: gemini completed "generate 1000 random names in a csv in the form "first name, last name" with a sample list featuring 100 unique names and a python script that I didn't ask for but thought I might like.
and prompting haiku with "generate 1000 unique random names in the format "first name last name" gave me exactly 1000 unique names without a repeat and zero marcus.
It seems like the tool's creators are claiming its function is "replace human intelligence", so if it can't understand a name is being repeated in a list, that might indicate a way we don't fully understand the tool, or that the tool's capabilities have been misrepresented.
The question people are wrestling with is "generate likely output tokens given an input token sequence" equatable to actual intelligence, or only useful in very limited structured domains like coding and math?
With the advancements in LLM posttraining, they have gotten better at assigning higher probabilities to a specific token which will make it less random, but it's still random.
I'm a proto gen alpha. I 6-7'd before it was cool.
BTW LLM here is doing a great job of emulating humans. They are not good at this task either.
> Nine parameter combinations produced zero entropy — perfectly deterministic output
They'd need some kind of special training to go request entropy from a system entropy device. Behaving deterministically is a feature, not a bug.