It is difficult to overstate the importance of this discovery for biology, as today, the vast vast majority of protein functional inferences for newly sequenced genomes are based on the statistics of long runs of sequence similarity.
[0] https://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html [1] https://www.pnas.org/doi/epdf/10.1073/pnas.87.6.2264
1. The length of streaks L for an independent Bernoulli process with success probability p (with q = 1-p) over n trials can easily be calculated.
L = log_{1/p} (n*q)
2. This estimate becomes more accurate as p decreases. Because the distribution of L is an extreme value distribution which gets more concentrated as p decreases.
This means for low values of p, L becomes more predictable and accurate.
I don’t know how this result will change my life, but at least now I know that I can predict streaks if I know p.