Possibly a Serious Possibility(kucharski.substack.com)

255 pointsby samclemens8 months ago23 comments

chaps8 months ago
The City of Chicago's lawyers went the opposite direction in response to @tpacek's affidavit that the release of table/column names would have "marginal value" to an attacker. The city latched onto that to get a trial that eventually went to the IL Supreme Court and lost.
```
    [I]n my affidavit, I wrote that SQL schemas would provide “only marginal value” to an attacker. Big mistake. Chicago jumped on those words and said “see, you yourself agree that a schema is of some value to an attacker.” Of course, I don’t really believe that; “only marginal value” is just self-important message-board hedging. I also claimed on the stand that “only an incompetently built application” could be attacked with nothing but it’s schema. Even I don’t know what I meant by that.
```
His post: https://sockpuppet.org/blog/2025/02/09/fixing-illinois-foia/ My post: https://mchap.io/losing-a-5yr-long-illinois-foia-lawsuit-for...
- snitty8 months ago
  >The City of Chicago's lawyers went the opposite direction
  Not really.
  >I wrote that SQL schemas would provide “only marginal value” to an attacker. Big mistake. Chicago jumped on those words and said “see, you yourself agree that a schema is of some value to an attacker.”
  The City of Chicago's argument was that something of ANY value, no matter how insignificant, would help an attacker exploit their system, and was therefore possible to keep secret under the FOIA law.
  - tptacek8 months ago
    You can just read the posts before trying to rebut the plaintiff in the case. The City of Chicago argued a bunch of stuff, but what matters is what the judges decided. Chicago's "no matter how insignificant" argument failed in Chancery Court and wasn't revived either in Appeals Court or at the Supreme Court.
    Ultimately, we lost because the Illinois Supreme Court interpreted the statute such that "file layouts" were per se exempt, regardless of how dangerous they were(n't), and then decided SQL schemas were "file layouts".
    (SQL schemas are basically the opposite of file layouts, but whatever).
    xnorswap8 months ago
    You shut down someone disagreeing because:
    > [...] what matters is what the judges decided
    But then say
    > SQL schemas are basically the opposite of file layouts
    Which is you disagreeing with what a judge has decided?
    It seems hypocritical to shut-down someone arguing with one aspect of the case on that basis, only to end with your own disagreement with a judge's decision.
    tptacek8 months ago
    No, I think you're mistaken. That the case didn't turn on how marginal a security risk was isn't a matter of opinion. That a SQL schema isn't a file layout is (though: there's clearly a right answer to that: mine).
  - fc417fc8028 months ago
    Such a literal interpretation isn't reasonable. There are all sorts of patterns that can be indirectly leaked through supposedly unrelated data. Yet FOIA exists and is obviously intended to be useful.
    So obviously there must be some threshold for the value to an attacker. He attempted to communicate that schemas are clearly below such a threshold and they used his wording to attempt to argue the opposite.
  - mcphage8 months ago
    > The City of Chicago's argument was that something of ANY value, no matter how insignificant, would help an attacker exploit their system, and was therefore possible to keep secret under the FOIA law.
    I’m glad that argument lost, since it totally subverts the purpose and intention of the FOIA. Any piece of information could be of value to some attacker, but that doesn’t outweigh the need for transparency.
  - numpad08 months ago
    > “only marginal value” to an attacker
    > “see, you yourself agree that a schema is of some value to an attacker.”
    IANAL, it appears justice systems universally interpret this type of "technically yes if that makes you happy but honestly unlikely" statements as "yes with technical bonus", not a "no with extra steps" at all, and it has to be shortened as just "unlikely from my professional perspective" or something lawyer approved for intended effect. Courts are weird.
    tptacek8 months ago
    To be clear: I think it was dumb of me to have written those hedges in my testimony, but they didn't really impact the case.
  - chaps8 months ago
    Yes really. Our argument, upheld by a judge, was that there was no value to an attacker. Their point stands legally, but nothing else.
    Despite all that, Chicago still pushes back aggressively. Here's a fun one from a recent denial letter they sent for data within the same database:
    "When DOF referred to reviewing over 300 variable CANVAS pages, these are not analog sequential book style pages of data. Instead, they are 300 different webpages with unique file layouts for which there is no designated first page."
    This is after I requested every field reflected in within the 300 different pages because it would be unduly burdensome to go through. I'm waiting for the city's response for the TOP page rather than the FIRST page. It's asinine that we have to do this in order to understand how these systems can blindly ruin the lives of many.
    They also argued the same 7(1)(g) exemption despite me being explicit about not wanting the column names. Effectively turning their argument into them saying that the release of information within a database, fullstop, is exempt because it could be used to figure out what data exists within a database. That's against the spirit of IL FOIA, which includes this incredibly direct statutory language:
    Sec. 1.2. Presumption. All records in the custody or possession of a public body are presumed to be open to inspection or copying. Any public body that asserts that a record is exempt from disclosure has the burden of proving by clear and convincing evidence that it is exempt.
    https://www.documentcloud.org/documents/25930500-foia-burden...
    https://www.documentcloud.org/documents/25930501-foia-burden...
    tptacek8 months ago
    Upheld by several judges, in fact. :)
hlieberman8 months ago
It’s not just the UK who has standardized on these language; the U.S. intelligence community also has a list of required terminology to use for different confidence levels and different likelihoods — and distinguishing between them. It’s all laid out in ICD-203, publicly available at https://www.dni.gov/files/documents/ICD/ICD-203.pdf
I’ve found it very helpful in the same vein as RFC 2119 terminology (MUST, SHOULD, MAY, etc.); when you need your meanings to be understood by a counterparty and can agree on a common language to use.
- bo10248 months ago
  Interesting. This terminology really makes no sense without more shared context, in my view. For example, I would not describe something that happens to me every month as a "remote possibility". Yet for a 3% chance event, repeated every day, monthly occurrences are what we expect. Similarly, someone who describes events as "nearly certain" would surely be embarrassed when one of the first 20 fails to happen, no?
  - randallsquared8 months ago
    A 1-in-a-million chance seems quite remote, yet if you are rolling the dice once a millisecond...
    This applies to any repeated chance, so it probably doesn't need to be called out again when translating odds to verbal language.
    bo10248 months ago
    I'm not only talking about repeated events, though. If someone told me about 20 different events that they were almost certain, and one failed to happen, I would doubt their calibration.
    Nevermark8 months ago
    19 out of 20 could easily mean they get 119 right out of every 120 “almost certainties” and they just rolled a 1 on a die 6 for the “luck” component.
    One off, 1 out of N errors are really hard to interpret, even with clear objective standards.
    And emphasizes why mapping regular language to objective meanings is a necessity for anything serious, but can still lead to problematic interactions.
    Probability assessments are will almost always sometimes certainly could be considered likely hard!
    MichaelZuo8 months ago
    I think the parent was suggesting a much higher percentage for “almost certain”… like >99.9%
  - handsclean8 months ago
    The terms aren’t causing this confusion, it’s that you’re applying them to less specific subjects, in particular omitting periods. “3% chance per day” makes sense, “remote chance on any particular day” makes sense, “remote chance” does not make sense, “3% chance” does not make sense. You’d also need to say “nearly certain over any 30 day period”, though if the odds are 3%/day then it’s only (just into) probable, not nearly certain—60%. It’d have to be 10%/day (not a remote chance, but highly improbable) to be nearly certain (96%) over 30 days.
    bo10248 months ago
    Well, this is why I started by saying "without more shared context"
  - voidUpdate8 months ago
    If something happens to you every month, that's a certainty. If it has a 3% chance to happen to you on any given day, its a remote possibility that it will happen to you on any given day
senderista8 months ago
I was so frustrated when I tried to get doctors to quantify their assessment of risk for a surgery my sister was about to undergo. They simply wouldn't give me a number, not even "better or worse than even odds". Finally an anesthesiologist privately told me she thought my sister had maybe a one-third chance of dying on the table and that was enough for me. I'm not sure how much fear of liability had to do with this reluctance, or if it was just a general aversion to discussing risk in quantitative terms (which isn't that hard, gamblers do it all the time!).
- chychiu8 months ago
  Doctor here
  1. It’s generally difficult to quantify such risks in any meaningful manner
  2. Provision of any number adds liability, and puts you in a damned-if-does, damned-if-it-doesn’t-work-out situation
  3. The operating surgeon is not the best to quantify these risks - the surgeon owns the operation, and the anaesthesiologist owns the patient / theatre
  4. Gamblers quantify risk because they make money from accurate assessment of risk. Doctors are in no way incentivised to do so
  5. The returned chance of 1/3 probably had an error margin of +/-33% itself
  - Jach8 months ago
    Not a lawyer but I do wonder if refusal to provide any number also adds liability, especially if it can be demonstrated to a court later that a reasonable estimate was known or was trivial to look up, and the deciding party would not have gone through with the action that ended in harm if they had been provided said number. I'm also not seeing how giving a number and then the procedure working out results in increased risk, perhaps you can expand on that? Like, where's the standing for a lawsuit if everything turned out fine but in one case you said the base rate number for a knee replacement surgery was around 1/1000 for death at the hospital and 1/250 for all-cause death within 90 days, but in another case you refused to quantify?
  - fc417fc8028 months ago
    > It’s generally difficult to quantify such risks in any meaningful manner
    According to the literature 33 out of 100 patients who underwent this operation in the US within the past 10 years died. 90% of those had complicating factors. You [ do / do not ] have such a factor.
    Who knows if any given layman will appreciate the particular quantification you provide but I'm fairly certain that data exists for the vast majority of serious procedures at this point.
    I've actually had this exact issue with the veterinarian. I've worked in biomed. I pulled the literature for the condition. I had lots of different numbers but I knew that I didn't have the full picture. I'm trying to quantify the possible outcomes between different options being presented to me. When I asked the specialist, who handles multiple such cases every day, I got back (approximately) "oh I couldn't say" and "it varies". The latter is obviously true but the entire attitude is just uncooperative bullshit.
    > puts you in a damned-if-does, damned-if-it-doesn’t-work-out situation
    Not really. Don't get me wrong, I understand that a litigious person could use just about anything to go after you and so I appreciate that it might be sensible to simply refuse to answer. But from an academic standpoint the future outcome of a single sample does not change the rigor of your risk assessment.
    > Doctors are in no way incentivised to do so
    Don't they use quantifications of risk to determine treatment plans to at least some extent? What's the alternative? Blindly following a flowchart? (Honest question.)
    > The returned chance of 1/3 probably had an error margin of +/-33% itself
    What do you mean by this? Surely there's some error margin on the assessment itself but I don't see how any of us commenting could have any idea what it might have been.
    munificent8 months ago
    > According to the literature 33 out of 100 patients who underwent this operation in the US within the past 10 years died. 90% of those had complicating factors. You [ do / do not ] have such a factor.
    Everyone has complicating factors. Age, gender, ethnicity, obesity, comorbidities, activity level, current infection status, health history, etc. Then you have to factor in the doctor's own previous performance statistics, plus the statistics of the anaesthesiologist, nursing staff, the hospital itself (how often do patients get MRSA, candidiasis, etc.?).
    And, of course, the more factors you take into account, the fewer relevant cases you have in the literature to rely on. If the patient is a woman, how do you correctly weight data from male patients that had the surgery? What are the error bars on your weighting process?
    It would take an actuary to chew through all the literature and get a maximally accurate estimate based on the specific data that is known for that patient at that point in time.
    fc417fc8028 months ago
    No one said anything about a maximally accurate estimate. This is exactly the sort of obtuse attitude I'm objecting to.
    By complicating factors I was referring to things that are known to have a notable impact on the outcome of this specific procedure. This is just summarizing what's known. It explicitly does not take into account the performance of any particular professional, team, or site.
    Something like MRSA is entirely separate. "The survival rate is 98 out of 100, but in this region of the country people recovering from this sort of thing have been exhibiting a 10% risk of MRSA. Unfortunately our facility is no exception to that."
    If the recipients of a procedure are predominately female and the patient is a male then you simply indicate that to them. "The historical rate is X out of Y, but you're a bit unusual in that only 10% of past recipients are men. I'm afraid I don't know what the implications of that fact might be."
    You provide the known facts and make clear what you don't know. No weasel words - if you don't know something then admit that you don't know it but don't use that as an excuse to hide what you do know. It's utterly unhelpful.
    danielmarkbruce8 months ago
    So, while you are correct, you are missing an important piece:
    most people cannot think like this
    I'm not talking about patients, I'm talking about everyone, including doctors. They just can't think in a probabilistic sense. And you'll counter that it's just reporting facts, but they don't even know which ones to report to you, how to report them, none of it. It just doesn't seem to fit in many peoples heads.
    fc417fc8028 months ago
    Fair enough. It's a depressing thought but you're probably right.
    ChadNauseam8 months ago
    this is part of the mindset had by doctors that makes some people want to “do their own research” rather than trust their physician. A medical intervention has to have positive expected value for it to be a good idea, and figuring out the expected value has to involve some quantification of risks. If doctors don’t want to do that because they could get sued if they don’t give a maximally accurate estimate and producing a maximally correct estimate would be too much work, then fine, it’s a free country and I don’t want to make doctors do anything they don’t feel like doing, but they are creating a situation where parents who want to figure out if something is a good idea have no choice but to start googling things themselves.
    I’ve undergone some surgeries that were not without risks and every time, i’ve been stonewalled by doctors when asking for basic information like “in your personal practice, what is the success rate for this surgery?”. Always something like “Oh, everyone is different, so there’s no way to give any estimates.” The only options are, either they have some estimate they think is accurate enough that they’re comfortable recommending the surgery but they won’t tell me (in which case they’re denying me useful information for their own benefit), or they have no idea and are recommending the surgery for some other reason (a very concerning possibility lol). Either way, it instantly makes our relationship adversarial to some extent, and means I need to do my own research if I want to be able to make an informed decision.
    chipsrafferty8 months ago
    Don't they use quantifications of risk to determine treatment plans to at least some extent?
    fiddlerwoaroof8 months ago
    I doubt doctors do: my guess would be most doctors follow a list of best practices devised by people like malpractice actuaries and by their sense of the outcomes from experience.
  - erikerikson8 months ago
    Thanks for sharing the realities you experience. The rest of this is picayuni.
    > Doctors are in no way incentivised to do so
    Personal pride, care for patient, and avoiding the mess of a bad outcome seem like powerful incentives. That said, I assume you mean they are not given explicit bonuses for good outcomes (the best trend to attract business and the highest salaries).
  - ekianjo8 months ago
    > It’s generally difficult to quantify such risks in any meaningful manner
    It's not for lack of data, that's for sure...
  - ninalanyon8 months ago
    In Norway a pregnant woman over forty is offered genetic counselling because of the risk of Downs syndrome. These risks are definitely quantifiable and no liability is generated by providing them. The counsellor (a doctor) explains the risks and the syndrome and apart from this appointment is not otherwise involved.
    This could surely be done for other situations, especially surgical procedures as the statistics should be collected and associated not only with the procedure but also the hospital and surgeon.
  - garrickvanburen8 months ago
    I would rather not have a surgeon considering failure rates ahead of any operation they're about to conduct.
    kmoser8 months ago
    On the off chance you're not being facetious: why? Isn't it part of their job description to weigh the ups and downs of any operation before conducting it? I'd imagine failure to do so would open them to liability.
    garrickvanburen8 months ago
    There are 2 parts:
    1. Presumably, the surgeon has determined that this specific intervention is the best possible intervention of all the possible ones (fewest downsides, best outcome, etc). There are always alternatives - including #wontfix.
    2. Once this decision has been made, I don't want them second guessing, I want them 100% confident in the decision and their abilities. If there's any lingering doubt - then return to step 1 and re-evaluate.
- 74028 months ago
  I think a lot of people don't understand statistics, which may make it hard for doctors to choose how to communicate things, even if they do have important knowledge that could be helpful.
  I once asked a doctor how long a relative might have to stay in intensive care:
  A: Oh, I couldn't possibly say.
  Q: Do you think he might be home in 3 or 4 days?
  A: Oh, no, not that soon.
  Q: So it might even be as long at 3 weeks?
  A: I highly doubt it would be that long.
  Q: So a reasonable estimate might be 1-2 weeks?
  A: Oh, I couldn't possibly say.
  I started the conversation having no idea whatsoever how long it would be, but I ended up with a good feel for a time estimate along with error bars.
  - ninalanyon8 months ago
    > ended up with a good feel for a time estimate along with error bars.
    You might have felt that but my impression is that the doctor in question was mostly making up something on the spot. I would guess that the doctor is simply saying that they have never encountered anyone in this situation leaving intensive care within four days and similarly no one who survived needed longer than three weeks. The last answer suggests that they have no actual statistics at all.
    74028 months ago
    Well, they weren't just making things up; there clearly was information that they had, that I wanted, but that they were initially reluctant to give me. At that point ultimate survival did not seem to be the issue, but I had no idea whether the answer to "how long" was going to be hours or months! Even if the statistics they had was something like N=5, that was still (mathematically) infinitely more than the N=0 that I had.
- 2rsf8 months ago
  > she thought my sister had maybe a one-third chance of dying on the table and that was enough for me
  But what was the alternative? I understand that you didn't get an answer, but the alternative of not operating could have been worst
  - senderista8 months ago
    The alternative was her having to endure more arthritic joint pain in the few years she had left (she was suffering from a degenerative disease and no one expected her to live more than 5 years). We decided that was better than the risk of an invasive surgery (hip replacement) with her history of adverse reaction to anesthesia.
- lwo32k8 months ago
  Gamblers are a poor example. Their decisions hardly effect anyone else or institutions or nations.
  Increase the cost of the fallout of a decision (your relationships, your bosses job, your orgs existence, economy, national security etc etc) and the real fun starts.
  People no matter what they say about other people's risk avoidance, all start behaving the same way as the cost increases.
  This is why we end up with Trump like characters up the hierarchy, every where you look, cause no one capable of appreciating the odds, wants to be sitting in those chairs and being held responsible for all kinds of things outside their control.
  Its also the reason why we get elaborate Signalling (costumes/rituals/pageantry/ribbons and medals/imposing buildings/PR/Marketing etc) to shift focus away from quantifying anything. See Theory of the Leisure Class. Society hasn't found better ways to keep Groups together while handling complexity the group is incapable of handling. Even small groups will unravel if there is too much focus on low odds of a solution.
jMyles8 months ago
> Since then, some governments have tried to clean up the language of probability. After the Iraq War—which was influenced by misinterpreted intelligence
While I laud the gracious application of Hanlon's Razor here, I also think that, for at least some actors, the imprecision was the feature they needed, rather than the bug they mistakenly implemented.
nickm128 months ago
Anyone else find the standard "probability yardstick" very misleading on the "unlikely" side? I know the whole point of the article is that English speakers can interpret these phrases differently, but calling a 1-in-3 chance "unlikely" seems off. I would shift that whole side down—30% as "a possibility", 10% as "unlikely", 5% as "highly unlikely".
- irjustin8 months ago
  You're right, whatever they pick will be wrong, but that's "missing the forest for the tree"
  The goal is to remove uncertainty in the language when documenting/discussing situations for the state.
  It doesn't matter that it's wrong colloquially or "feels wrong". It's that when you're reading or talking about a subject with the government, you need to use a specific definition (and thusly change your mental model because everyone is doing as such) so that no one gets misunderstood.
  Would it be better to always use raw numbers? Honestly I don't know.
  - rendaw8 months ago
    Or new words unrelated to vernacular? Like "This is a cat-3 risk". When repurposing words in common usage they're always going to be fighting against intuition and triggering confusion because of it.
    fc417fc8028 months ago
    I was thinking this while reading it as well. Why go to all this trouble when we know in advance there will still be issues? Standardize on not using casual wording for quantified risks and instead provide the actual numbers. Something like "40 (out of 100) error 10 (plus or minus)". No more ambiguity. My offhand example even abbreviates nicely as scientific notation ie 40e10. I'm sure someone who actually spent some time on this could come up with something better.
dejobaan8 months ago
That was a good read (and short, with a cool graph—I want to know who tagged "Almost No Chance" as 95% likely; a would-be Pratchett fan, perhaps). In biz, that's part of why I like to separate out goals ("we'll focus on growing traffic") and concrete objectives ("25% audience growth between now and June 1st").
- ModernMech8 months ago
  My feeling is it's a measure of the number of people who read the question wrong.
- patrickmay8 months ago
  But is it EXACTLY a million to one chance?
  - ninalanyon8 months ago
    Sounds like a quote from pTerry's Guards! Guards!
    Sergeant Colon adjusted his armor haughtily.
    “When you really need them the most,” he said, “million-to-one chances always crop up. Well-known fact.”
  - forrestthewoods8 months ago
    I hate “one in a million” because its meaning depends on how many times you’re rolling the die!
    I’ll never forgot old World of Warcraft discussions about crit probability. If a particular sequence is “one in a million” and there are 10 million players and each player encounters hundreds or thousands of sequences per day then “one in a million” is pretty effing common!
    JadeNB8 months ago
    > I hate “one in a million” because its meaning depends on how many times you’re rolling the die!
    I'd argue that it doesn't depend on that at all. That is, its meaning is the same whether you're performing the trial once, a million times, ten million times, or whatever. It's rather whether the implication is "the possibility may be disregarded" or "this should be expected to happen occasionally" that depends on how many times you're performing the trial.
    forrestthewoods8 months ago
    I accept your terminology as more precise.
    pmontra8 months ago
    One in a million is more than rolling 4 doubles in a row in backgammon (it's played with two 6 sided dice.) So if a backgammon app or server starts having about 10 thousands players it's not uncommon that every single month (or day) there is such a sequence. Some players will eventually write in a review or in a support forum that the server, the bot, the app cheats against them because of the impossible odds of what just happened. The support staff have to explain the math with dubious results, which is ironic because every single decision in backgammon should be made with probabilities in mind.
    gpcz8 months ago
    In functional safety, probabilities are usually clamped to an hour of use.
  - 8 months ago
    undefined
christiangenco8 months ago
I've had the same sort of difficulty with phrases like "most" or "almost all" or "hardly any"—I crave for these to map to unambiguous numbers like the probability yardstick referenced in this article.
I spun up a quick survey[1] that I sent out to friends and family to try to get some numbers on these sorts of phrases. Results so far are inconclusive.
1. https://www.vaguequantifiers.com/
- SAI_Peregrinus8 months ago
  "Almost all" is an interesting one, because it has family of mathematical definitions in addition to any informal definitions. If X is a set, "almost all elements of X" means "all elements of X except those in a negligible subset of X", where "negligible" depends on context but is well-defined.
  If there's a finite subset of an infinite set, almost all members of the infinite set are not in the finite set. E.g. Almost all integers are not 5: the set of integers equal to five is finite and the set of integers not equal to five is countably infinite.
  Likewise for two infinite sets of different size: Almost all real numbers are not integers.
  Etc.
- mannykannot8 months ago
  The more precisely they are defined, the less frequently will you see them used correctly.
- jbaber8 months ago
  "Almost all" in math can mean "except at every integer or fraction" :)
  - JadeNB8 months ago
    > "Almost all" in math can mean "except at every integer or fraction" :)
    I am a mathematician, but, even so, I think that this is one of those instances where we have to admit that we have mangled everyday terminology when appropriating it, and so non-measure theoretic users should just ignore our definition. (Similarly with "group," where, at the risk of sounding tongue-in-cheek because it's so obvious, if I were trying to analyze how people usually understand its everyday meaning I wouldn't include the requirement that every element have an inverse.)
  - tejtm8 months ago
    I would expect almost NO numbers are rational (integer or fraction) with an infinite number of Reals between each.
    JadeNB8 months ago
    > I would expect almost NO numbers are rational (integer or fraction) with an infinite number of Reals between each.
    You're right (technically correct, which is the best etc.)! That is why "almost all" can mean everything except rational numbers.
    noqc8 months ago
    in between any two real numbers, there is a rational number, and vice versa.
    concordDance8 months ago
    And yet somehow there are infinity times more reals than rationals...
    Very hard to get your head around!
    noqc8 months ago
    An fun, immediate, horrifying, consequence.
    If I remove n elements from R, the remainder has n+1 connected components.
    The complement of Z in R has |Z| connected components.
    The complement of Q in R has |R| connected components.
  - layer88 months ago
    The semantics are almost always reasonable: https://en.wikipedia.org/wiki/Almost_all
  - dullcrisp8 months ago
    Sure but that’s because 100% of real numbers, by any standard measure, aren’t integers or fractions. It bothers me if it’s used to mean 95% of something though.
hunter2_8 months ago
"Rare" versus "common" is an interesting one. They sound like antonyms, but I don't think the typical probabilities are really symmetrical. Maybe something like 0%-10% for rare (although some sources say 5%) and something like 40%-100% for common.
- konstantinua008 months ago
  "common" has such a large spread because meaning behind it is sort of "at least one in each sample", where that sample can be anything (graspable)
  if you're a teacher and one student per class does the same thing - it's common. Even though it's only 1/25 or 1/30 of all students
  - BobaFloutist8 months ago
    I think I mentally interpret "common" as "a member of a plurality category," which is to say in the same order of magnitude of commonality as the most common group at a given level of detail.
    What you're describing, I think I would call "Not uncommon." Or, to put it another way, you shouldn't be surprised for any given case to exhibit it, but you shouldn't expect it either.
- Macha8 months ago
  Maybe it's my amount of video games played in childhood that influenced that, but common and rare are just two points on a spectrum (with at least "uncommon" in between)
bmurray7jhu8 months ago
Text of NIE 29-51 "Probability of an Invasion of Yugoslavia in 1951"
Partial HTML: https://history.state.gov/historicaldocuments/frus1951v04p2/...
Full text PDF scan: https://www.cia.gov/readingroom/docs/CIA-RDP79R01012A0007000...
Macha8 months ago
Who are the people that have a small bump of believing "better than even" is 10-20%? Why?
- tempestn8 months ago
  You also see the opposite bump for most of the negative assessments. My assumption is that they're likely reading the question backwards. ie. "how unlikely" vs "how likely" or similar.
  - vpribish8 months ago
    maybe something like dyslexia but for semantics. I did some searching and couldn't find a term for this.
chipsrafferty8 months ago
Why not just actually list the number you have in mind so everyone's on the same page "we consider it a serious possibility - about 60% - that bla bla bla"
- andrewflnr8 months ago
  Almost no one making these statements has an actual number in mind, or they would just say it. Probably not even in intelligence, definitely not in popular usage.
  - kmoser8 months ago
    But if there's a standardized chart that maps the phrase to a number, certainly you'd expect whoever is writing the phrase to know what number it maps to. For the sake of simplicity, then, why not just use the number to avoid all doubt?
    dragonwriter8 months ago
    > But if there's a standardized chart that maps the phrase to a number,
    The chart doesn't map each phrase to a number, in fact given that the colored areas extend beyond the marked lines and fade, it doesn't even map each to an unambiguous, bounded range of numbers.
    andrewflnr8 months ago
    The biggest reason is probably just that your boss told you to. But also downthread we were talking about the illusion of quantitative thought that comes from using specific numbers, which would be a slightly better reason to use words instead.
    qmr8 months ago
    I don't think there is a standardized chart. Rather the chart shows people's subjective mental mapping of the description to a probability.
    andrewflnr8 months ago
    The chart at the bottom of the article is explicitly a standard adopted by UK intelligence. (ed: well, proposed for adoption, it's not clear from the article how far it actually got.)
  - qznc8 months ago
    Using actual numbers requires a little bit of training but not much. I believe many would benefit from doing it.
- suddenlybananas8 months ago
  Because that's not how the mind works. We don't have conscious access to our internal credence of some event in probabilities (unclear if we even evaluate probabilities internally at all).
- tasuki8 months ago
  Because then it doesn't happen and (dumb) people will say "see you were wrong".
mempko8 months ago
It's strange to map language to probability ranges. The guidance should be to just say the probability range. No ambiguity. Clear. Actionable and also measurable.
- throwaway815238 months ago
  It's still a subjective estimate, but Samosvety (predictor group) does seem to work that way, and HPMOR suggested something similar. Basically assign probabiltiies to less complex unknowns using numbers pulled out of your butt if that's all you can do. Then you can compute conditional probabilities of various more complicated events using those priors. At least then, a consistent set of numbers has carried through the calculation, even if those numbers were wrong at the outset. It's suppose to help your mental clarity. I guess you can also perturb the initial numbers to guess something like a hyperdistribution at the other end.
  I haven't tried this myself and haven't run across a situation to apply it to lately, but I thought it was interesting.
  - Jach8 months ago
    If you don't perturb the initial numbers to see what changes to help construct a richer model (as well as one with tighter bounds on what it predicts or forbids), you're leaving out a lot of the benefits of the exercise. Sometimes the prior doesn't matter that much because you find you already have or can easily collect sufficient data to overcome various arbitrary priors. Sometimes different priors will result in surprisingly different conclusions, some of them you can even be more confident in ruling out because of absence of data in their predictions. (Mathematically, absence of evidence is evidence of absence, though of course it's just an inequality so the proof says nothing on whether it's weak or strong evidence.) And of course some priors are straight from the butt but others are more reasonably estimated even if still quite uncertain; in any case much like unit testing of boundary conditions you can still work through a handful of choices to see the effects.
    Someone mentioned fermi calculations, a related fun exercise in this sort of logic is the work on grabby aliens: https://grabbyaliens.com/
  - andrewflnr8 months ago
    > a consistent set of numbers has carried through the calculation, even if those numbers were wrong at the outset
    I kind of see how this might be useful, but what I've actually seen is an illusion of certainty from looking at numbers and thinking that means you're being quantitative instead of, you know, pulling things out of your butt. Garbage in, garbage out still applies.
    throwaway815238 months ago
    Yes, the potential illusion is most dangerous if you show someone else the numbers and they take them seriously. If they're only for your own calculations then you can remember what they are made of.
    plorkyeran8 months ago
    In practice people seem to be very bad at remembering that. Pretty universally people act as though doing math on made up numbers makes them less erroneous rather than more.
    widforss8 months ago
    That's the whole point of Fermi estimates. Find a plausible number given uncertain inputs.
    chrisweekly8 months ago
    Yeah, mistaking precision for accuracy is a common fallacy.
    JadeNB8 months ago
    > Yeah, mistaking precision for accuracy is a common fallacy.
    I remember an appealing description of the difference being that a precise archer might sink all their arrows at the same spot on the edge of the target, whereas an accurate archer might sink all their arrows near the bull's eye without always hitting the same spot.
    mempko8 months ago
    Yes, despite pulling the numbers out of your butt, having them and then seeing how accurate you are helps you calibrate your butt. That way over time, the numbers you pull out of your but become more accurate (and even possibly more precise)
- Muromec8 months ago
  That's the other way around -- there was no probability range to begin with.
- layer88 months ago
  How would you possibly measure the “Probability of an Invasion of Yugoslavia in 1951”, in March 1951?
  - konstantinua008 months ago
    9/12
    3 months have passed, 9 to go :)
- csours8 months ago
  Or use a histogram.
SoftTalker8 months ago
I have a habit of saying "almost definitely" which I have tried to break but I still fall back to it occasionally. And I know several people who will say something is "definitely possible" or "certainly a possibility" or something along those lines. It's all insecure language we use to avoid making a statement that might turn out to be wrong.
- rekenaut8 months ago
  I often say "definitely possible" when I am not sure what the chance of something happening is but I ought to acknowledge that it is possible. It is definitely possible that I should choose better language to communicate this.
  - smitty1e8 months ago
    When they won't quit asking, I'm "willing to commit to a definite maybe".
didgetmaster8 months ago
"The odds are more like a million to one!"
"So...you're telling me there is a chance!"
tempestn8 months ago
Interesting. Two things that jumped out to me were 1) why do the regions of the standardization line not overlap or at least meet? And 2) What's up with the small but clear minority of people who took all the 'unlikely' phrasings to mean somewhere in the realm of 90 to 100%? My guess would be they're misreading the question and that is their estimate of unlikelihood?
- pictureofabear8 months ago
  Because many people cannot or will not accept ambiguity. Charitably, I suppose this comes from a desire to logically deduce risk by multiply the severity of the consequences by the chance that something will happen. Uncharitably, it gives decisionmakers a scapegoat should they need one.
ta202405288 months ago
The actual problem is its not possible to assign a percentage probability - or even a vague 'highly likely'- to a once-off event.
As long as the prediction is not 0% or 100%, its impossible to be wrong.
So, without hesitation, I predict with 99,999999% certainty that an asteroid will hit the Eiffel tower before the hour.
voidmain8 months ago
The bonkers thing is that all of the visualizations show basically all of the terms pretty close to the middle on a log odds scale. If "highly unlikely" means 10-20%, how do you express 1 in 10,000??
- immibis8 months ago
  Realistically? That's within the margin of error of "completely impossible"
- stefs8 months ago
  For uncommon values you could still use exact language, i.e. just say "1 in 10,000".
8 months ago
undefined
Zanni8 months ago
I don't understand the point of standardizing language around specific numerical ranges when they could just use numbers.
- inejge8 months ago
  They would have to use ranges, though, and I think that the non-numeric phrases flow better, which should aid comprehension.
immibis8 months ago
Gotta love how "almost no chance" includes probabilities less than zero.
mrkramer8 months ago
Yugoslav communists were pawns of Moscow from the beginning and all the way until the end of WW2. After WW2, they grew some balls and put the interests of degenerate Yugoslavian state before the interests of the communist international, that's what almost cost them their state. And also they were traitors of their peoples because during WW2 they didn't support national movements but wanted cosmopolitan socialistic Yugoslavia with the help of Soviet Union.
a3w8 months ago
How was this not on lesswrong.com, they are all about ]0..1[

This problem crops up everywhere, especially when it's a consequential claim. Eg when the US Department of Energy says with 'low confidence' that the Sars-COV2 outbreak and pandemic was 'most likely' was the result of a laboratory leak, what number does that translate to on the certainty scale?

Also, what likelihood can we assign to claims that the virus was deliberately modified at the furin cleavage site as part of a gain-of-function research program aimed at assessing the risks of species-jumping behavior in bat coronaviruses? This is a separate question from the lab escape issue, which dould have involved either a collected wild-type virus or one that had been experimentally modified.

Perhaps experts in the field 'misinterpreted the evidence' back in the early months of the pandemic, much as happened with the CIA and its 'intelligence on Iraq'?

https://interestingengineering.com/health/us-doe-says-covid-...

nightpool8 months ago

I highly recommend that you read through https://www.astralcodexten.com/p/practically-a-book-review-r... and watch the underlying debate (starting here https://www.youtube.com/watch?v=Y1vaooTKHCM), it does a really good job of laying out the arguments for and against lab leak in a very thorough and evidence-based way like you're asking for here.

photochemsyn8 months ago

"Viral" by Alina Chan and Matt Ridley is worth reading. But I don't think there's much doubt now that Sars-CoV2 was the result of reckless gain-of-function research conducted jointly between China's Wuhan Institute of Virology, America's Baric Lab in North Carolina, and facilitated by funding through EcoHealth Alliance, the NIH and the NIAID. Whoopsie.

nightpool8 months ago

There's a lot of doubt, actually. Peter responds to this possibility in a lot of detail in the debate I linked:

    Even if WIV did try to create COVID, they couldn’t have. As Yuri said, COVID looks like BANAL-52 plus a furin cleavage site. But WIV didn’t have BANAL-52. It wasn’t discovered until after the COVID pandemic started, when scientists scoured the area for potential COVID relatives. WIV had a more distant COVID relative, RATG-13. But you can’t create COVID from RATG-13; they’re too different. You would need BANAL-52, or some as-yet-undiscovered extremely close relative. WIV had neither.

    Are we sure they had neither? Yes. Remember, WIV’s whole job was looking for new coronaviruses. They published lists of which ones they had found pretty regularly. They published their last list in mid-2019, just a few months before the pandemic. Although lab leak proponents claimed these lists showed weird discrepancies, this was just their inability to keep names consistent, and all the lists showed basically the same viruses (plus a few extra on the later ones, as they kept discovering more). The lists didn’t include BANAL-52 or any other suitable COVID relatives - only RATG-13, which isn’t close enough to work.

    Could they have been keeping their discovery of BANAL-52 secret? No. Pre-pandemic, there was nothing interesting about it; our understanding of virology wasn’t good enough to point this out as a potential pandemic candidate. WIV did its gain-of-function research openly and proudly (before the pandemic, gain-of-function wasn’t as unpopular as it is now) so it’s not like they wanted to keep it secret because they might gain-of-function it later. Their lists very clearly showed they had no virus they could create COVID from, and they had no reason to hide it if they did.

    COVID’s furin cleavage site is admittedly unusual. But it’s unusual in a way that looks natural rather than man-made. Labs don’t usually add furin cleavage sites through nucleotide insertions (they usually mutate what’s already there). On the other hand, viruses get weird insertions of 12+ nucleotides in nature. For example, HKU1 is another emergent Chinese coronavirus that caused a small outbreak of pneumonia in 2004. It had a 15 nucleotide insertion right next to its furin cleavage site. Later strains of COVID got further 12 - 15 nucleotide insertions. Plenty of flus have 12 to 15 nucleotide insertions compared to other earlier flu strains.

....

    COVID’s furin cleavage site is a mess. When humans are inserting furin cleavage sites into viruses for gain-of-function, the standard practice is RRKR, a very nice and simple furin cleavage site which works well. COVID uses PRRAR, a bizarre furin cleavage site which no human has ever used before, and which virologists expected to work poorly. They later found that an adjacent part of COVID’s genome twisted the protein in an unusual way that allowed PRRAR to be a viable furin cleavage site, but this discovery took a lot of computer power, and was only made after COVID became important. The Wuhan virologists supposedly doing gain-of-function research on COVID shouldn’t have known this would work. Why didn’t they just use the standard RRKR site, which would have worked better? Everyone thinks it works better! Even the virus eventually decided it worked better - sometime during the course of the pandemic, it mutated away from its weird PRRAR furin cleavage site towards a more normal form.

    COVID is hard to culture. If you culture it in most standard media or animals, it will quickly develop characteristic mutations. But the original Wuhan strains didn’t have these mutations. The only ways to culture it without mutations are in human airway cells, or (apparently) in live raccoon-dogs. Getting human airway cells requires a donor (ie someone who donates their body to science), and Wuhan had never done this before (it was one of the technologies only used at the superior North Carolina site).

photochemsyn8 months ago
It's equally likely that the Wuhan Institute of Virology was testing constructs created in the Baric Lab in their bat and mice models, and this was initiated during the 2014-2017 ban on gain-of-function research in the USA that Fauci vocally opposed.
The reality here is that there are thousands of mammalian viruses that don't infect humans that could be modified to infect humans via specific modifications of their target mammalian cell-surface receptor proteins, as was done in this specific case of a bat coronavirus modified at its furin cleavage site to make it human-targetable. Any modern advanced undergrad student in molecular biology could explain this to you, if you bothered to listen.
So first, we need an acknowledgement of Covid that vastly embarrasses China and the USA, and second we need a global treaty banning this generation of novel human pathogens from wild mammalian viral types... I guess I won't hold my breath.
- fc417fc8028 months ago
  > there are thousands of mammalian viruses that don't infect humans that could be modified to infect humans via specific modifications of their target mammalian cell-surface receptor proteins
  Are you claiming that's what happened here? What virus do you propose was modified in such a manner? Where was it sourced from? Why was it not on the published lists of discovered viruses?
  > as was done in this specific case of a bat coronavirus modified at its furin cleavage site to make it human-targetable
  Did you even read the comment you're responding to? It details why the furin cleavage site does not resemble something that you would expect humans to have produced.

TacticalCoder8 months ago
[dead]