Roulette, for instance, is only theoretically 38/1, but in actuality all roulette tables have imperfections such that certain numbers almost always get hit more than others; even certain colors, under extraordinary circumstances.
One could say: well, but isn't this the case for all probabilities? Not so: in the case of the lottery, the spread of numbers people tend to choose may not be so random, but the drawing itself is as close to random as possible. A run on the lottery is very different from a run on a roulette table and a run in baseball, or even a run in elections: there are forces, even if they aren't necessarily measurable, that determine these things and strict probabilistic analysis has no hold on these forces. It's almost certainly the case that a hurricane will hit Florida in September of 2025, even though nobody can precisely predict it, nobody would bet against it. It's just the same way with almost all chance in society, except for that which is already controlled from the outset.
Seems unlikely these imperfections are enough to shift it significantly from 1/38, based on both the variation in the geometry of roulette tables that's small enough to be non-obvious being tiny in comparison with the variation in croupier action, and the likelihood of casinos noticing any very long run deviation in the size of their edge (which is contingent upon customers hitting the zero pocket(s) with a certain frequency)
It is difficult to overstate the importance of this discovery for biology, as today, the vast vast majority of protein functional inferences for newly sequenced genomes are based on the statistics of long runs of sequence similarity.
[0] https://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html [1] https://www.pnas.org/doi/epdf/10.1073/pnas.87.6.2264
1. The length of streaks L for an independent Bernoulli process with success probability p (with q = 1-p) over n trials can easily be calculated.
L = log_{1/p} (n*q)
2. This estimate becomes more accurate as p decreases. Because the distribution of L is an extreme value distribution which gets more concentrated as p decreases.
This means for low values of p, L becomes more predictable and accurate.
I don’t know how this result will change my life, but at least now I know that I can predict streaks if I know p.
The author says that people who are making up "random" numbers generally won't put in identical sequences. Using that + Bedford = a good way to find faked data. But for this to work you need to understand the probabilities, which would be difficult to do for "natural" distributions?