I let ChatGPT analyze a decade of my Apple Watch data, then I called my doctor(www.msn.com)

243 pointsby zdw12 days ago41 comments

chrisfosterelli12 days ago
Health metrics are absolutely tarnished by a lack of proper context. Unsurprisingly, it turns out that you can't reliably take a concept as broad as health and reduce it to a number. We see the same arguments over and over with body fat percentages, vo2 max estimates, BMI, lactate thresholds, resting heart rate, HRV, and more. These are all useful metrics, but it's important to consider them in the proper context that each of them deserve.
This article gave an LLM a bunch of health metrics and then asked it to reduce it to a single score, didn't tell us any of the actual metric values, and then compared that to a doctor's opinion. Why anyone would expect these to align is beyond my understanding.
The most obvious thing that jumps out to me is that I've noticed doctors generally, for better or worse, consider "health" much differently than the fitness community does. It's different toolsets and different goals. If this person's VO2 max estimate was under 30, that's objectively a poor VO2 max by most standards, and an LLM trained on the internet's entire repository of fitness discussion is likely going to give this person a bad score in terms of cardio fitness. But a doctor who sees a person come in who isn't complaining about anything in particular, moves around fine, doesn't have risk factors like age or family history, and has good metrics on a blood test is probably going to say they're in fine cardio health regardless of what their wearable says.
I'd go so far to say this is probably the case for most people. Your average person is in really poor fitness-shape but just fine health-shape.
- inopinatus12 days ago
  Many of those metrics are population or sampling measures and are confounded by many factors at an individual level. The most notorious of which is BMI; it is practically a category error to infer someone's health or risk by individual BMI, and yet doing so remains widespread amongst people that are supposed to know better.
  Instrumentation and testing become primarily useful at an individual level to explain or investigate someone's disease or disorder, or to screen for major risk factors, and the hazards and consequences of unnecessary testing outweigh the benefits in all but a few cases. For which your GP and/or government will (or should) routinely screen those at actual risk, which is why I pooped in a jar last week and mailed it.
  An athlete chasing an ever-better VO2max or FTP hasn't necessarily got it wrong, however. We can say something like, "Bjorn Daehlie’s results are explained by extraordinary VO2max", with an implication that you should go get results some other way because you're not a five-sigma outlier. But at the pointy end of elite sport, there's a clear correlation between marginal improvement of certain measures and competitive outcomes, and if you don't think the difference of 0.01sec between first and third matters then you've never stood on a podium. Or worse, next to one. When mistakes are made and performance deteriorates, it's often due to chasing the wrong metric(s) for the athlete at hand, generally a failure of coaching.
  - FeteCommuniste12 days ago
    > The most notorious of which is BMI; it is practically a category error to infer someone's health or risk by individual BMI, and yet doing so remains widespread amongst people that are supposed to know better.
    BMI works fine for people who aren't very muscular, which is the great majority of people. Waist to height ratio might be more informative for people with higher muscle mass.
    jermaustin111 days ago
    As a person who has been told I'm "morbidly obese" for decades now, I will say that doctors at almost every level look at your chart not you. I've been told time and time again that until I get my weight under control, my health will suffer.
    I'm 5'8" and weigh on average 210lbs. My BMI isn't even morbidly obese, it is 31, which is just "regular" obese, but on top of that, a DEXA scan shows that I am actually only 25% body fat, with only 1lb of visceral fat.
    Doctor's don't care about that, they see on the Epic chart that my BMI is > 30 and have to tell me some spiel about a healthier lifestyle so they check check off a checkbox and continue to the next screen.
    sotix11 days ago
    I'd consider 5'8 and 210lbs morbidly obese. An average male at 5'8 should generally weigh about 150lbs and no more than 164lbs.
    csa11 days ago
    > I'd consider 5'8 and 210lbs morbidly obese. An average male at 5'8 should generally weigh about 150lbs and no more than 164lbs
    You would consider incorrectly then.
    This person has ~155 pounds of lean body mass. 164 would put him at roughly a body builder level of fat, which basically requires a part time job in cooking and nutrition to maintain.
    For reference, I’m in a similar situation to this person. I’m 5’11” (180cm) and about 200 lbs (91kg) with about 170 lbs of lean body mass. My dexa scan says that I’m 15% body fat, but I get the same lectures from doctors about being obese and needing a lifestyle change, all based on BMI and (I assume) my size (I’m barrel chested). It’s completely absurd.
    firmretention9 days ago
    Dexas are notoriously inaccurate. Your dexa scan is probably wrong, and you are fatter than you think. I've been lifting over a decade, so I have far more muscle mass than the average person, and I am 6'1", yet am still easily over 20% BF if I'm 200 lbs or more. Don't believe me? Try to get truly shredded. You'll see for yourself that you will have to lose far more weight than you think. Everyone is fatter and less muscular than they think they are, even if they're active. Unless of course you are a heavy steroid user, in which case you may actually be muscular enough for that to be valid. But for the average natural trainee? Nobody who's truly lean is getting an obese or morbidly obese BMI. Overweight at worst, maybe.
    BMI is definitely inaccurate for those with greater amounts of muscle mass, but not as inaccurate as many would like to believe.
    csa9 days ago
    I didn’t want to belabor the point in my original post, but since you went there…
    The next steps at the doctor is that I show them my MyFitnessPal nutrition tracking, my dexascan, and (at some point) take off my shirt. I ask them what exactly it is I should change. 100% of the time the answer has been something like “Oh, sorry. Please continue as you are doing.”
    They just aren’t used to seeing muscular 200 pound dudes at my height in my area at my age (btw, I’m in my 50s).
    Also, someone can workout in the gym all they want, but I think most people will struggle with lowering their body fat percentage if they don’t focus on their nutrition.
    I realize that my lean body mass (both bones and muscle) are decreasing, and that rate of decrease be higher each year. That said, I’m doing what I can to maintain whatever muscle and bone mass I have.
    jermaustin111 days ago
    If I got rid of all of my fat and bones, I'd still weigh more than 150lbs. I have the most muscular 150lbs man inside of me.
    Ideal body fat percentage is 18-24% - I'm at 25% (or was in November - might be +/- 2% since then - gained a few pounds weight, but not waist size).
    So I would say I'm not morbidly obese or even regular obese based on the percentage of my body that is muscle vs fat.
    machomaster9 days ago
    You are fat, though. For a man, the ideal fat percentage is 15-20%. 20+%, let alone 25%, is not healthy at all.
    Teever11 days ago
    Or that guy could be a burly bricklacker / concerete worker who can casually carry hundreds of pounds of weight all day every day in brutal conditions.
    It's really hard to tell with the data provided.
    jermaustin111 days ago
    burly - maybe, but I haven't done any hard labor most of my life. I ran track as a kid, and kept my high metabolism - (RMR: 2460kcal, TDEE: 3380kcal); well lost it when my thyroid failed, but medicated myself back to it. I eat what I want, but its a very high lean-meat diet (lots of chicken breast and turkey because my wife likes them), but I don't limit my carb intake either, as I mostly burn sugar for energy (according to my Respiratory Exchange Ratio).
    Somehow my body is just amazing at working without any help from me. I don't even exercise much. Maybe a few pushups a day, up and down my stairs at my house a couple dozen times a day, and probably 5-10k steps a day max.
    joshhart11 days ago
    Huh. The standard in your case is to measure waist circumference if BMI is high. Did no doctor do that? As long as you are below 40” or 37” if Asian you are considered good to go.
    jermaustin111 days ago
    None ever did.
    On top of that, I'm not sure if that is a real indication of anything, either.
    The reason to do that is to get an idea of your abdominal fat (which is the more dangerous place for fat to store), but there are two types of abdominal fat, one is dangerous (visceral fat) and one is completely benign (subcutaneous fat). And a measurement around your waist won't tell you which you have.
    I personally have almost all of my fat subcutaneous, with only 1lb of visceral fat (which is right in the perfect range).
    prmoustache11 days ago
    > Doctor's don't care about that
    Literally all of them?
    coldtea11 days ago
    When humans talk, they use generalizations (and don't need to annouce them). Here it means that most doctors don't care about that.
    Follow that rule next time you read such a statement in a context that's not formal math.
    prmoustache11 days ago
    > most
    That is not even true. We are talking anecdotal evidence here.
    coldtea11 days ago
    Yes, humans have found that you don't need officially stamped statistics (and in many cases they're unreliable or "doctored" anyway), and that they can make general observations on their own, through something they call experience.
    And a near universal experience with doctors for anybody paying attention is that.
    One can reject it or accept it and improve upon it after checking its predictive power, or they can pause their thinking and wait for some authority to give them the official numbers on that.
    PaulDavisThe1st11 days ago
    > When humans talk, they use generalizations
    All humans?
    Sorry :)
    coldtea11 days ago
    Well, when humans talk, they use generalizations, which applies recursively to this statement :)
    Though, on second thought: yes, all humans, and not merely as a generalization. 100% of humans do it.
    jermaustin111 days ago
    I can't say literally all, but in my experience with having to get a new GP almost every year because of health insurance changes, location changes, hospital consolidation buying my GPs practice, and multiple doctors retiring or just quitting medicine (my last GP was tired of medicine after practicing for only 3 years). Over the last 20 years, I've had almost 15 GPs across 5 states (NY, NJ, CT, TX, LA). I also have multiple auto immune diseases, so I have had a handful of specialists of various flavors (endocrine, oncology - not for cancer, cardiology, and urology), but only need them occasionally.
    Almost every single start of every single appointment (including a follow up from just a couple days prior), they comment about my BMI. It is the rare time they don't that I remember. My last urology appointment the doctor was very congenial, didn't even go over the lab work, just said, everything is looking good, asked how I was feeling, everything good, alright, refilled my prescriptions and left.
    Nicook11 days ago
    I mean those stats arent good...
    friendzis2 days ago
    No. BMI does not work as a diagnostic measure for general population. The range of "normal" BMI values does depend at least on genetic lineage, gender and individual development history. Fine to compare two scandinavian lineage men, but if you compare e.g. a dutch man with an african woman oh boy, you error margins would be mid-to-high single digit units
    > Waist to height ratio
    Again, while not a bad metric per se, translates poorly between cohorts.
    oarfish11 days ago
    My understanding is that it doesnt even do that, because it creates false negatives for the so called skinny fat body type: significant visceral fat mass, which is what we are concerned about, but not much muscle or peripheral fat mass, thereby not being flagged by BMI screens, even though they are at risk.
    11 days ago
    undefined
    11 days ago
    undefined
    inopinatus11 days ago
    > BMI works fine
    An individual learns nothing from its calculation and it has no clinical value. I receive more constructive feedback from an auntie jabbing me in the chest and saying "you got fat".
    > the great majority of people
    There is wide morphological variety across human populations, so, no.
    tclancy11 days ago
    I dunno, basing life decisions off a metric that has a fudge factor built into it to make the regression work feels sub-optimal to me.
    XorNot11 days ago
    BMI underestimates in most cases and your body fat is higher then the chart would predict.
    When people say "oh BMI isn't accurate" it means you are more overweight then it suggests unless you are literally an extreme body builder.
    Spivak11 days ago
    This underestimation has a name, "Normal Weight Obesity." Known by the slang "hot guy/girl fit" where the person looks like they would be physically fit because they're skinny but there's no muscle under there.
- Shank12 days ago
  > But a doctor who sees a person come in who isn't complaining about anything in particular, moves around fine, doesn't have risk factors like age or family history, and has good metrics on a blood test is probably going to say they're in fine cardio health regardless of what their wearable says.
  This is true of many metrics and even lab results. Good doctors will counsel you and tell you that the lab results are just one metric and one input. The body acclimates to its current conditions over time, and quite often achieves homeostasis.
  My grandma was living for years with an SpO2 in the 90-95% range as measured by pulse oximetry, but this was just one metric measured with one method. It doesn't mean her blood oxygen was actually repeatedly dropping, it just meant that her body wasn't particularly suited to pulse oximetry.
  - vidarh11 days ago
    It doesn't help when doctors are often unaware of outliers affecting the test results. E.g. I've had a number of doctors freak out over my eGFR (kidney function) test results because the default test they use is affected by body mass and diet, and made even worse by e.g. preworkout supplements with creatine. None of my doctors have been aware of this, and I've had to explain it to them.
    cthalupa11 days ago
    I've not seen evidence that creatine actually has significant impact on eGFR. Anecdotally, mine does not budge even on 5g a day. Meta-analysis show minimal impact, e.g. https://pmc.ncbi.nlm.nih.gov/articles/PMC12590749/
    Muscle mass obviously does, though. cystatin c is a better market if your body composition differs from the "average"
    vidarh11 days ago
    I did end up taking a cystatin c test privately to be able to prove to my GP that the results he freaked out over were nonsense. I'm in the UK, and for whatever reason the NHS just doesn't typically do them for basic kidney function - presumably cost, but they were dirt cheap to do privately so...
    harvey911 days ago
    NICE guidelines. "Evidence on the specific eGFR equations or ethnicity adjustments seen by the committee was not from UK studies so may not be applicable to UK black, Asian and minority ethnic groups. None of the studies included children and young people. The committee was also concerned about the value of P30 as a measure of accuracy (P30 is the probability that the measured value is within 30% of the true value), the broad range of P30 values found across equations and the relative value or accuracy of ethnicity adjustments to eGFR equations in different ethnic groups. The committee agreed that adding an ethnicity adjustment to eGFR equations for different ethnicities may not be valid or accurate...."
    https://www.nice.org.uk/guidance/ng203/chapter/rationale-and...
    vidarh11 days ago
    What does ethnicity has to do with anything?
    My creatinine levels are high because my body mass - including muscle mass - is well above average. On the basic kidney tests my GP did, my numbers indicated kidney disease. Doing a Cystatin C test showed very clearly that my numbers were firmly in the normal range.
    The page does go on to point out the muscle mass issue:
    > The committee highlighted the 2008 recommendation, which states that caution should be used when interpreting eGFR and in adults with extremes of muscle mass and on those who consume protein supplements (this was added to recommendation 1.1.1).
    Further down they do mention Cystatin C, and seem to have basically decided that a risk of false positives is acceptable because of a lower risk of false negatives. That part is interesting, and it may well be the right decision at a population level.
    But if your muscle mass is sufficiently above average, the regular kidney tests done will flag up possible kidney disease every single damn time you do one, and my experience is that UK doctors are totally oblivious to the fact that this is not necessarily cause for concern for a given patient and will often just assume a problem and it will be up to the patient to educate them.
    EDIT: What's worse, actually, is the number of times I've had doctors or nurses try to help me to "game" this test by telling me to e.g. drink more before the test next time, seemingly oblivious that irrespective of precision, making changes to conditions that also invalidates it as a way to track changes in eGFR is not helpful.
    cthalupa11 days ago
    I'm not sure what point you're trying to make here. Have I missed somewhere in the discussion where eGFR equation adjustment based on ethnicity has been discussed?
    Creatinine is the standard marker used for eGFR. It is also a byproduct of muscle metabolism. People who regularly lift weights or have lifestyles that otherwise result in a higher-than-normal muscularity will almost universally have higher creatinine levels than those who don't, assuming similar baseline kidney function. It's also problematic for people with extremely low muscle mass, for the opposite reason.
    It's one of the reasons enhanced bodybuilders can get bit with failing kidney function - they know that their eGFR is going to look worse and worse based on creatinine formulas so they ignore it, when the elevated blood pressure from all the dbol they're popping is killing their kidneys.
    Cystatin C is the better option for people with too much (or too little) muscle for creatinine to be accurate.
- colechristensen12 days ago
  >I'd go so far to say this is probably the case for most people. Your average person is in really poor fitness-shape but just fine health-shape.
  Modern medicine has failed to move into the era of subtlety and small problems and many people suffer as a result. Fitness nerds and general non-scientists fill the gap poorly so we get a ton of guessing and anecdotal evidence and likely a whole lot of bad advice.
  Doctors won't say there's a problem until you're SICK and usually pretty late in the process when there's not a lot of room to make improvements.
  At the same time, doctors won't do anything if you're 5% off optimal, but they'll happily give you a medicine that improves one symptom that's 50% off optimal that comes along with 10 side effects. Although unless you're dying or have something really straightforward wrong with you, doctors don't do much at all besides giving you a sedative and or a stimulant.
  Doctors don't know what to do with small problems because they're barely studied and the people who DO try to do something don't do it scientifically.
  - anon700012 days ago
    A worthwhile book to read on this topic is Outlive by Peter Attia (MD). The core premise is that American healthcare focuses far too much on treating problems after they’re extremely severe. It is would be cheaper and healthier to invest more into conservative & preventative care, trying to prevent or minimize problems early in life before they become incredibly dangerous and expensive/difficult/impossible to treat.
    I have a close friend who works in conservative care, and it’s astonishing what they see. For example, someone went to a number of specialists and doctors about a throat condition where they really struggled swallowing. They even had to swallow a radioactive pill to do some kind of imaging. Unnecessary exposure, and an expensive process to go through, and ultimately went exactly nowhere.
    Meanwhile, it was a simple musculoskeletal issue which my friend was able to resolve in a single visit with absolutely no risk to the patient.
    Medical schools need to stop producing MDs who reach for pills as the first line of defense without trying to root cause issues. Do you really need addictive pain killers, or maybe some PT, exercise, massage, etc. to help resolve your pain.
  - lnsru12 days ago
    It’s not medicine. It’s healthcare system. Doctor isn’t paid enough to go thoroughly through the complaint and dig deeper. In Germany you get 5 minutes diagnose and that’s all from health insurance. And this from the better doctor. For normal one diagnose comes from 2 minutes interaction. Believing that the diagnose is right is very naive.
  - Angostura12 days ago
    > Doctors won't say there's a problem until you're SICK and usually pretty late in the process when there's not a lot of room to make improvements.
    As someone who is fit and active,in their 60s with zero obvious symptoms, but is nonetheless on cholesterol and blood pressure medication, this isn't true (in the UK, at least)
    pjc5012 days ago
    One of the things the NHS does surprisingly well, and is only really possible because it's a completely vertically integrated system, is population-level preventative medicine. Distributing insulin and salbutamol. Screening for various sorts of cancer. Cholesterol and BP checks. Encouraging people to stop smoking.
  - PlatoIsADisease11 days ago
    I think one of the major problems is that biologists/scientists cannot legally treat people. Physicians take their studies and have monopolistic treatment powers over them.
    I think this creates a huge knowledge gap.
  - steveBK12311 days ago
    It’s also cultural. Most American doctors don’t bother to tell people if they are overweight and out of shape. It’s not something their customers reward.
    thewebguyd11 days ago
    > customers
    And there's the problem. That they are "customers" that pay, either direct or via insurance, or via government insurance vs. a nationalized healthcare system, and I mean healthcare not nationalized health insurance
    colechristensen11 days ago
    I mean... most people already know, it's not like either of those things come as a surprise to anybody.
    machomaster9 days ago
    Most obese people think "I am a bit on a heavy side, but I am not that fat and definitely not obese".
    People are generally in denial about their fat percentage and their muscle mass. Even somewhat healthy people (~20% fat) who are calculating how much they must lose in order to get to a healthy 12-15%, are surprised when the reality shows that they calculations were 5-15kg off.
    steveBK1239 days ago
    Most people are wrong about their body type, in the wrong direction (overweight think they aren't that overweight, skinny think they need to lose weight).
    Having an objective voice from your doctor giving you annual feedback and suggestions is better than ignoring the topic entirely.
    colechristensen9 days ago
    that's gym bro science, there's no compelling health reason to lower your fat percentage to 12-15% and it carries as much risk as being rather obese when accounting for all causes mortality, particularly for women and people getting older
  - Propelloni12 days ago
    Maybe I'm not getting you right, but IMO it hasn't? I, as a customer/patient, just don't weekly converse with my MD about small issues, and frankly, they have better things to do, for example treating sick people.
    Instead I use the health benefits programs of my health care insurer. My insurer has an interest in prevention, so I can get consulting for free (or very low fees), and even kickbacks if I regularly participate in fitness courses and maintain my yearly check-up routine. Now, I live in Germany and it probably is different in other countries, but it just makes economic sense from the insurer's point of view so that I would be surprised if it were very different elsewhere.
- sksksk12 days ago
  >This article gave an LLM a bunch of health metrics and then asked it to reduce it to a single score, didn't tell us any of the actual metric values, and then compared that to a doctor's opinion. Why anyone would expect these to align is beyond my understanding.
  This gets to one of LLMs' core weaknesses, they blindly respond to your requests and rarely push back against the premise of it.
  - next_xibalba11 days ago
    I read somewhere that LLM chat apps are optimized to return something useful, not correct or comprehensive (where useful is defined as the user accepts it). I found this explanation to be a useful (ha!) way to explain to friends and family why they need to be skeptical of LLM outputs.
- theshrike7912 days ago
  Measuring metrics is easy, it's the algorithm on the backend that matters.
  There's a reason why Oura rings are expensive and it's not the hardware - you can get similar stuff for 50€ on Aliexpress.
  But none of them predicted my Covid infection days in advance. Oura did.
  A device like the Apple Watch that's on you 24/7 is good with TRENDS, not absolute measurements. It can tell you if your heart rate, blood oxygen or something else is more or less than before, statistically. For absolute measurements it's OK, but not exact.
  And from that we can make educated guesses on whether a visit to a doctor is necessary.
  - smallerfish11 days ago
    > But none of them predicted my Covid infection days in advance. Oura did.
    It actually warned you, or retrospectively looking at the metrics you could see that there was a pattern in advance of symptoms? (If the latter, same here with my Garmin watch - precipitous HRV decline in the 7 days before symptoms. But no actual warning.)
    theshrike7911 days ago
    It actually told me, they've been doing this for a while: https://ouraring.com/blog/early-covid-symptoms/
    Of course it didn't tell me "you have COVID19-B variant C" - but it did tell me I'm probably sick and should seek care.
  - yolo300012 days ago
    I'm curious how the ring detected it in advance? I also discovered my Covid when I looked at my Garmin watch and my resting heart rate was 100, until then I had thought I had too much sun that day.
    theshrike7911 days ago
    Some of the metrics were out of whack, I think my average body temp was up along with my resting heart rate both asleep and awake.
    It somehow takes all that and gave me a "you might be sick" notification.
    SirMaster11 days ago
    How is that predicting in advance though? Sounds like it measured active symptoms like a change in body temp etc. That's not prediction, that's reaction.
    taeric11 days ago
    I think it is fair to assume they meant before symptoms? Which, yes, your heart rate is a symptom. No, it isn't one most people consider.
    theshrike7911 days ago
    Device detects 0.1 degree average temp increase. I don’t.
    Like your car will start with a small noise first, you can’t hear it. But in time the small noise becomes a big noise just before things break.
    If you catch it in the small noise part, you can proactively prepare.
  - 12 days ago
    undefined
- saghm11 days ago
  On the other hand, if compressing to a single number is not possible, a doctor will just refuse to give a grade in that way. In my experience, most doctors tend to be very careful about trying to avoid saying anything definitive that they're not actually sure of, even if they're reasonably confident, in large part because part of their job involves understanding how patients react to how things are communicated to them. Being willing to confidently give a misleading answer to a bad question is itself as bad thing when it comes to health data because regular people aren't able to (and shouldn't be expected to) figure out what various interferences from health data happen to feasible from a given data set.
- teleforce11 days ago
  >But a doctor who sees a person come in who isn't complaining about anything in particular, moves around fine, doesn't have risk factors like age or family history, and has good metrics on a blood test is probably going to say they're in fine cardio health regardless of what their wearable says.
  The standard risk model for CVD based on SCORE-2 and PREVENT like parameters are very poor as reported in the recently published paper on the their accuracy performance by the Swedish team [1]. As all CVD risk stratification with cardiologist review, the most important accuracy is sensivity (avoiding false negative that will escape review) of SCORE-2 and PREVENT, 48% and 26%, respectively.
  The paper alternative proposal increased the sensitivity to 58% by performing clustering instead of conventional regression models as practiced in the standard SCORE-2 (Europe) and PREVENT (US).
  These type of models including the latest proposal performed very poorly as indicated by their otherwise excellent and intuitive display of graphical abstract results [1].
  [1] Risk stratification for cardiovascular disease: a comparative analysis of cluster analysis and traditional prediction models:
  https://academic.oup.com/eurjpc/advance-article/doi/10.1093/...
- eleveriven12 days ago
  The problem is that the product itself invites the wrong expectation
- brewcejener11 days ago
  [dead]
wawayanda12 days ago
A year or so ago, I fed my wife's blood work results into chatgpt and it came back with a terrifying diagnosis. Even after a lot of back and forth it stuck to its guns. We went to a specialist who performed some additional tests and explained that the condition cannot be diagnosed with just the original blood work and said that she did not have the condition. The whole thing was a borderline traumatic ordeal that I'm still pretty pissed about.
- greenknight12 days ago
  On the flip side, i had some pain in my chest... RUQ (right upper quadrant for those medical folk).
  On the way to the hospital, ChatGPT was pretty confident it was a issue with my gallbladder due to me having a fatty meal for lunch (but it was delicious).
  After an extended wait time to be seen, they didnt ask about anything like that, and at the end they were like anything else to add, added it in about ChatGPT / Gallbladder... discharged 5 minutes later with suspicion of Gallbladder as they couldnt do anything that night.
  Over the next few weeks, got test after test after test, to try and figure out whats going on. MRI. CT. Ultrasound etc.etc. they all came back negative for the gallbladder.
  ChatGPT was persistant. It said to get a HIDA scan, a more specialised scan. My GP was a bit reluctant but agreed. Got it, and was diagnosed with a hyperkinetic gallbladder. It is still unrecognised as an issue, but mostly accepted. So much so my surgeon initally said that it wasnt a thing (then after doing research about it, says it is a thing)... and a gastroentologist also said it wasnt a thing.
  Had it taken out a few weeks ago, and it was chroically inflammed. Which means the removal was the correct path to go down.
  It just sucks that your wife was on the other end of things.
  - tharkun__12 days ago
    This reminds me of another recent comment in some other post, about doctors not diagnosing "hard to diagnose" things.
    There are probably ("good") reasons for this. But your own persistence, and today the help of AI, can potentially help you. The problem with it is the same problem as previously: "charlatans". Just that today the charlatan and the savior are both one and the same: The AI.
    I do recognize that most people probably can't tell one from the other. In both cases ;)
    You'll find this in my post history a few times now but essentially: I was lethargic all the time, got migraine type headaches "randomly" a lot. Having the feeling I'd need to puke. One time I had to stop driving as it just got so bad. I suddenly was no longer able to tolerate alcohol either.
    I went to multiple doctors, was sent to specialists, who all told me that they could maaaaaybe do test XYX but essentially: It wasn't a thing, I was crazy.
    Through a lot of online research I "figured out" (and that's an over-statement) that it was something about the gut microbiome. Something to do with histamine. I tried a bunch of things, like I suspected it might be DAO (Di-Amino-Oxidase) insufficiency. I tried a bunch of probiotics, both the "heals all your stuff" and "you need to take a single strain or it won't work" type stuff. Including "just take Actimel". Actimel gave me headaches! Turns out one of the (prominent) strains in there makes histamine. Guess what, Alcohol, especially some, has histamines and your "hangover" is also essentially histamines (made worse by the dehydration). And guess what else, some foods, especially some I love, contain or break down into histamines.
    So I figured that somehow it's all about histamines and how my current gut microbiome does not deal well with excess histamines (through whichever source). None of the doctors I went to believed this to be a "thing" nor did they want to do anything about it. Then I found a pro-biotic that actually helped. If you really want to check what I am taking, check the history. I'm not a marketing machine. What I do believe is that one particular bacterium helped, because it's the one thing that wasn't in any of the other ones I took: Bacillus subtilis.
    A soil based bacterium, which in the olden times, you'd have gotten from slightly not well enough cleaned cabbage or whatever vegetable du jour you were eating. Essentially: if your toddler stuffs his face with a handful of dirt, that's one thing they'd be getting and it's for the better! I'm saying this, because the rest of the formulation was essentially the same as the others I tried.
    I took three pills per day, breakfast, lunch and dinner. I felt like shit for two weeks, even getting headaches again. I stuck with it. After about two weeks I started feeling better. I think that's when my gut microbiome got "turned around". I was no longer lethargic and I could eat blue cheese and lasagna three days in a row with two glasses of red wine and not get a headache any longer! Those are all foods that contain or make lots of histamine. I still take one per day and I have no more issues.
    But you gotta get to this, somehow, through all of the bullshit people that try to sell you their "miracle cure" stuff. And it's just as hard as trying to suss out where the AI is bullshitting you.
    There was exactly a single doctor in my life, who I would consider good in that regard. I had already figured the above one out by that time but I was doing keto and it got all of my blood markers, except for cholesterol into normal again. She literally "googled" with me about keto a few times, did a blood test to confirm that I was in ketosis and in general was just awesome about this. She was notoriously difficult to book and later than any doctor for schedules appointments, but she took her time and even that would not really ever have been enough to suss out the stuff that I figured out through research myself if you ask me. While doctors are the "half gods in white", I think there's just way too much stuff and way too little time for them. It's like: All the bugs at your place of work. Now imagine you had exactly one doctor across a multitude of companies. Of course they only figure out the "common" ones ...
    steveBK12311 days ago
    One challenge that may sound obvious.. is that super rare stuff gets seen super rarely, even by specalists.
    In practice it means you often have to escalate from GP to local specialist to even more narrow specialist all the way to one of the regional big city specialist that almost exclusively get the weird cases.
    This is because every hop is an increasingly narrow area of speciality.
    Instead of just “cancer doctor” its the “GI cancer doctor” then its “GI cancer doctor of this particular organ” then its “an entire department of cancer doctors who work exclusively on this organ who will review the case together”, etc.
    tstrimple12 days ago
    It's horses not zebras until it's actually a zebra and your life depends on it. I think those sorts of guidelines are useful in the general case. But many medical issues quickly move beyond the general case and need closer examination. Not sure how you do that effectively without wasting tons of money on folks with indigestion.
    xenonite12 days ago
    Interesting to read, thank you very much. Are you still eating ketogenic? The bacillus subtilis seems to metabolize glucose, so are yours still alive? And did you try other probiotica beforehand? I am having HIT and eating a mostly carnivore diet with mostly fresh/unfermented meat.
    tharkun__11 days ago
    I no longer do keto no. I also started keto after I had gotten better already from the probiotics but not much. I'm not sure where you read about that subtilis can only live off of glucose. I'm having a hard time finding primary sources that actually talk about this but handily Google's "AI mode" also "answered" my search query and it does state it primarily thrives on glucose and sugars but can also break down and live off of proteins and fats.
    FWIW, as I understand it, many probiotics aren't going to colonize on their own and "stick around" for a prolonged period of time when you stop taking them, even under good circumstances but you can't quote me on that so to speak. And in the past we would've gotten many of them through one way or another through our diet as well, just not through a probiotic but naturally.
    I tried multiple probiotics. Both blends of multiple types as well as things like "Saccharomyces Boulardii"-only preparation. I don't recall all the exact ones I tried though.
  - tonyhart712 days ago
    after reading your comment, my perception is mixed
  - rubatuga12 days ago
    If it was inflamed would your GGT level be high?
- fn-mote12 days ago
  > I fed my wife's blood work results into chatgpt and it came back with a terrifying diagnosis
  I don't get it... a doctor ordered the blood work, right? And surely they did not have this opinion or you would have been sent to a specialist right away. In this case, the GP who ordered the blood work was the gatekeeper. Shouldn't they have been the person to deal with this inquiry in the first place?
  I would be a lot more negative about "the medical establishment" if they had been the ones who put you through the trauma. It sounds like this story is putting yourself through trauma by believing "Dr. GPT" instead of consulting a real doctor.
  I will take it as a cautionary tale, and remember it next time I feed all of my test results into an LLM.
  - kolinko12 days ago
    At least in Poland, I can almost always see my results before my doctor does - I get a notification that the labwork is ready and I can view results online.
    Also, the regular bloodwork is around $50-$100 (for noninsured or without a prescription), so many people just do this out of pocket once in a while and only bring to doctor if anything looks suspicious.
    Finally, there is EU regulation about data that applies to medical field as well - you always have the right to view all the data that any company has stored about you. Gatekeeping is forbidden by law.
  - vineyardmike12 days ago
    You don't need a doctor to order bloodwork. I get a full panel done yearly, just to establish a baseline and trend. I try not to overanalyze it, and just keep it around for a professional in case some real issue arises in the future.
    jbverschoor11 days ago
    In some countries you do. The Netherlands for example
- themafia12 days ago
  > it stuck to its guns
  It gave you a probabilistic output. There were no guns and nothing to stick to. If you had disrupted the context with enough countervailing opinion it would have "relented" simply because the conversational probabilities changed.
  - tstrimple12 days ago
    I was amused but not impressed when I was able to convince Claude Code that it was useless and absolutely not a service worth paying for. It literally apologized and recommended I ask for a refund. I mean, I still get lots of value from CC. Just that it's easy to push them into whatever corner you want.
  - nprateem12 days ago
    It's amazing this still needs to be said, especially here
    coffeefirst12 days ago
    Here, sure.
    For the general public, these tools have been advertised this way.
    So if a good subset of HN still gets fooled, the layperson is screwed.
    gizajob12 days ago
    Hmm or the layperson wouldn’t be “smart” enough to think that ChatGPT could give useful answers to complex health questions.
  - 12 days ago
    undefined
- SchemaLoad12 days ago
  I asked a doctor friend why it seems common for healthcare workers to keep the results sheets to themself and just give you a good/bad summary. He told me that the average person can't properly understand the data and will freak themselves out over nothing.
  - smt8811 days ago
    I'm in the US and have never experienced anyone keeping results to themselves.
    In fact, I can now easily access even my doctor's appointment notes. I have my entire chart unless my doctor specifically writes private notes.
- worldsavior12 days ago
  I think it's your problem you got stressed from a probabilistic machine answering with what you want to hear.
- josefresco11 days ago
  I fed about 4ish years of blood tests into an AI and after some back and forth it identified a possible issue that might signal recovery. I sheepishly brought it up with my doc, who actually said it might be worth looking into. Nothing earth shattering, just another opinion.
- lugu12 days ago
  I am sorry I have to say so, but the value of LLM is their ability to reason based on their context. Don't use them as smart wikipedia (without context). To your use case, provide them with different textbook and practice handbook and with the medical history of the person. Then ask your question in a neutral way. Then ask it to verify their claim in another session and provide references.
  It is so unfortunate that a general chatbot designed to answer anything was the first use case pushed. I get it when people are pissed.
- fouc12 days ago
  > it stuck to its guns
  Everyone that encounters this needs to do a clean/fresh prompt with memory disabled to really know if the LLM is going to consistently come to the same conclusion or not.
- pengaru11 days ago
  > A year or so ago, I fed my wife's blood work results into chatgpt
  Why would you consult a known bullshit generator for anything this important?
- eleveriven12 days ago
  Stories like yours are why I'm skeptical of these "health insight" products as currently shipped. Visualization, explanation, question-generation - great. Acting like an interpreter of incomplete medical data without a strong refusal mode is genuinely dangerous
- mrguyorama11 days ago
  >The whole thing was a borderline traumatic ordeal that I'm still pretty pissed about.
  Why did you do the thing people calmly explained you should not do? Why are you pissed about experiencing the obvious and known outcome?
  In medicine, even a test with "Worrying" results is rarely an actual condition requiring treatment. One reason doctors are so bad at long tail conditions is that they have been trained, both by education and literal direct experience, that chasing down test results without any symptoms is a reliable way to waste money, time, and emotions.
  It's a classic statistics 101 topic to look at screening tests and notice that the majority of "positive" outcomes are false positives.
- daveguy12 days ago
  Please keep telling your story. This is the kind of shit that medical science has been dealing with for at least a century. When evaluating testing procedures false positives can have serious consequences. A test that's positive every time will catch every single true positive, but it's also worthless. These LLMs don't have a goddamn clue about it. There should be consequences for these garbage fires giving medical advice.
  - maerF0x012 days ago
    Part of the issue is taking it's output as conclusion rather than as a signal / lead.
    I would never let an LLM make an amputate or not decision, but it could convince me to go talk with an expert who sees me in person and takes a holistic view.
- irjustin12 days ago
  Isn't it two sides to the same coin?
  You should be happy about it that it's not the thing specifically when the signs pointed towards it being "the thing"?
  - themafia12 days ago
    You are _absolutely_ going to die in the next 30 minutes.
    When it doesn't happen will you still be happy?
    nprateem12 days ago
    Depends if I'm now broke from blowing it all on crack and hookers.
    irjustin12 days ago
    How is this apples-apples at all?
    But to answer directly... yes? yes, I am.
    [edit]
    A bit it more real. My blood pressure monitor says my bp is 200/160. Chat says you're dead get yourself to a hospital.
    Get to the hospital and says oh your bp monitor is wrong.
    I'm happy? I would say that I am. Sure I'm annoyed at my machine, but way happier it's wrong than right.
    vineyardmike12 days ago
    This is another example of why its frustrating still.
    "Yes I'm happy I'm not dying" ignores that "go to the hospital [and waste a day, maybe some financial cost]" because a machine was wrong. This is still pretty inconvenient because a machine wasn't accurate/calibrated/engineered weak. Not dying is good, but the emotions and fear for a period of time is still bad.
    irjustin12 days ago
    Yeah I guess I just don't see eye-to-eye on this.
    I 100% understand those frustrations. That the "detectors" should've been more accurate, or the fears, battery of tests, and costs associated of time and money. But, if you have the means to find out something that could have been extremely concerning is actually "nothing wrong" - isn't that worth it?
    My friend is 45, had bloody stool -> colonoscopy -> polyps removed -> benign. Isn't that way better than colon cancer?
    Maybe it's a glass half-empty-full thing.
- ltbarcly311 days ago
  It's interesting because presumably you were too ashamed to tell the doctor "we pasted stuff into chatgpt and it said it means she is sick", because if you had said that he would have looked at the bloodwork and you could have avoided going to a specialist.
  It's an interesting cognitive dissonance that you both trusted it enough to go to a specialist but not enough to admit using it.
  - 11 days ago
    undefined
- orionsbelt12 days ago
  > "A year or so ago"
  What model?
  Care to share the conversation? Or try again and see how the latest model does?
- jesterson12 days ago
  Never ceases to surpise me why people taking word salad output so seriously.
  And probably the same people laugh at ancient folks carefully listening to shamans.
- bigbuppo12 days ago
  Why not just ask WebMD?
- port1111 days ago
  You used a predictive/statistical proximity chatbot on a single point-in-time snapshot of her blood, and you’re pissed that the result wasn’t useful? I think any decent GP would push back, want to see trends in the data, or at least look at the broader context.
  I mean, at some point we have to admit that LLMs aren’t designed for correctness but utility.
- terribleperson12 days ago
  Do you have a custom prompt/personality set? What is it?
  - ltbarcly311 days ago
    Yea, if only he had said "make sure you are always honest" first!
    terribleperson10 days ago
    It's not that you need to ask it to be honest, it's that the defaults are kind of stupid and obnoxiously sycophantic. ChatGPT is also prone to getting stuck on particular ideas. If you're using the vanilla personalities without a custom prompt, not aware of and working against its issues, and not starting new chats occasionally you won't get good results. You'll get good-sounding garbage.
    Part of my custom prompt is ```When using factual information beyond what I provide, verify it when possible. When researching factual questions—especially by relying on papers and studies—actively look for null findings, negative results, and contradictory evidence, not just positive or confirmatory findings.```
    To me, the most interesting result of that part of my prompt is that in thinking mode, it ends up re-checking it's assumptions and sources fairly often. It's not about honesty, but correctness.
    A custom prompt isn't the be-all end-all either. The right kind of questioning is important, and you also need to get a fresh context when you ask new questions or if you want to double check something.
- filoeleven11 days ago
  Gotta love the replies to this. At least more of the botheads are now acting like they're trying to ask helpful questions instead of just flat out saying "you're using it wrong."
- gizajob12 days ago
  You’re pissed about your own stupidity? In asking for deep knowledge and medical advice from a Markov chain?
freedomben12 days ago
> Despite having access to my weight, blood pressure and cholesterol, ChatGPT based much of its negative assessment on an Apple Watch measurement known as VO2 max, the maximum amount of oxygen your body can consume during exercise. Apple says it collects an “estimate” of VO2 max, but the real thing requires a treadmill and a mask. Apple says its cardio fitness measures have been validated, but independent researchers have found those estimates can run low — by an average of 13 percent.
There's plenty of blame to go around for everyone, but at least for some of it (such as the above) I think the blame more rests on Apple for falsely representing the quality of their product (and TFA seems pretty clearly to be blasting OpenAI for this, not others like Apple).
What would you expect the behavior of the AI to be? Should it always assume bad data or potentially bad data? If so, that seems like it would defeat the point of having data at all as you could never draw any conclusions from it. Even disregarding statistical outliers, it's not at all clear what part of the data is "good" vs "unrealiable" especially when the company that collected that data claims that it's good data.
- brandonb12 days ago
  FWIW, Apple has published validation data showing the Apple Watch's estimate is within 1.2 ml/kg/min of a lab-measured Vo2Max.
  Behind the scenes, it's using a pretty cool algorithm that combines deep learning with physiological ODEs: https://www.empirical.health/blog/how-apple-watch-cardio-fit...
  - itchyouch12 days ago
    The trick with the vo2 max measurement on the apple watch though is that the person can not waste any time during their outdoor walk and needs to maintain a brisk pace.
    Then there's confounders like altitude, elevation gain that can sully the numbers.
    It can be pretty great, but it needs a bit of control in order to get a proper reading.
  - ignoramous12 days ago
    The paper itself: https://www.apple.com/healthcare/docs/site/Using_Apple_Watch...
    Seems like Apple's 95% accuracy estimate for VO2 max holds up.
    Thirty participants wore an Apple Watch for 5-10 days to generate a VO2 max estimate. Subsequently, they underwent a maximal exercise treadmill test in accordance with the modified Åstrand protocol. The agreement between measurements from Apple Watch and indirect calorimetry was assessed using Bland-Altman analysis, mean absolute percentage error (MAPE), and mean absolute error (MAE). Overall, Apple Watch underestimated VO2 max, with a mean difference of 6.07 mL/kg/min (95% CI 3.77–8.38). Limits of agreement indicated variability between measurement methods (lower -6.11 mL/kg/min; upper 18.26 mL/kg/min). MAPE was calculated as 13.31% (95% CI 10.01–16.61), and MAE was 6.92 mL/kg/min (95% CI 4.89–8.94). These findings indicate that Apple Watch VO2 max estimates require further refinement prior to clinical implementation. However, further consideration of Apple Watch as an alternative to conventional VO2 max prediction from submaximal exercise is warranted, given its practical utility.
    https://pmc.ncbi.nlm.nih.gov/articles/PMC12080799/
    mr_toad11 days ago
    That’s saying that they’re 95% confident that the mean measurement is lower than the treadmill estimate, not that the watch is 95% accurate. In other words they’re confident that the watch underestimates VO2 max.
    adastra2211 days ago
    That is an extraordinary small sample size.
- aeonfox12 days ago
  > I think the blame more rests on Apple for falsely representing the quality of their product
  There was plenty of other concerning stuff in that article. And from a quick read it wasn't suggested or implied the VO2 max issue was the deciding factor for the original F score the author received. The article did suggest many times over the ChatGPT is really not equipped for the task of health diagnosis.
  > There was another problem I discovered over time: When I tried asking the same heart longevity-grade question again, suddenly my score went up to a C. I asked again and again, watching the score swing between an F and a B.
  - brandonb12 days ago
    The lack of self-consistency does seem like a sign of a deeper issue with reliability. In most fields of machine learning robustness to noise is something you need to "bake in" (often through data augmentation using knowledge of the domain) rather than get for free in training.
  - freedomben11 days ago
    > There was plenty of other concerning stuff in that article.
    Yeah for sure, I probably didn't make it clear enough but I do fault OpenAI for this as much as or maybe more than Apple. I didn't think that needed to be stressed since the article is already blasting them for it and I don't disagree with most of that criticism of OpenAI.
- AndrewKemendo12 days ago
  > Should it always assume bad data or potentially bad data? If so, that seems like it would defeat the point of having data at all as you could never draw any conclusions from it.
  Yes. You, and every other reasoning system, should always challenge the data and assume it’s biased at a minimum.
  This is better described as “critical thinking” in its formal form.
  You could also call it skepticism.
  That impossibility of drawing conclusions assumes there’s a correct answer and is called the “problem of induction.” I promise you a machine is better at avoiding it than a human.
  Many people freeze up or fail with too much data - put someone with no experience in front of 500 ppl to give a speech if you want to watch this live.
  - freedomben11 days ago
    I mostly agree with you, but I think it's important to consider what you're doing with the data. If we're doing rigorous science, or making life-or-death decisions on it, I would 100% agree. But if we're an AI chatbot trying to offer some insight, with a big disclaimer that "these results might be wrong, talk to your doctor" then I think that's quite overkill. The end result would be no (potential) insight at all and no chance for ever improving since we'll likely never get a to a point where we could fully trust the data. Not even the best medical labs are always perfect.
- miltonlost12 days ago
  > What would you expect the behavior of the AI to be? Should it always assume bad data or potentially bad data? If so, that seems like it would defeat the point of having data at all as you could never draw any conclusions from it.
  Well, I would expect the AI to provide the same response as a real doctor did from the same information. Which the article went over the doctors were able to.
  I also would expect the AI to provide the same answer every time to the same data unlike what it did (from F to B over multiple attempts in the article)
  OpenAI is entirely to blame here when they are putting out faulty products, (hallucinations even on accurate data are a fault of them).
  - jdub12 days ago
    Why do you have those expectations?
- jayd1612 days ago
  Well if it doesn't know the quality of the data and especially if it would be dangerous to guess then it should probably say it doesn't have an answer.
  - freedomben11 days ago
    I don't disagree, but that reinforces my point above I think. If AI has to assume the data is of poor quality, then there's no point in even trying to analyze it. The options are basically:
    1. Trust the source of the data to be honest about it's quality
    Or
    2. Distrust the source
    Approach number 2 basically means we can never do any analysis on it.
    Personally I'd rather have a product that might be wrong than none at all, but that's a personal preference.
- hmokiguess12 days ago
  I have been sitting and waiting for the day these trackers get exposed as just another health fad that is optimized to deliver shareholder value and not serious enough for medical grade applications
  - NoPicklez12 days ago
    I don't see how they are considered a health fad, they're extremely useful and accurate enough. There are plenty of studies and real world data showing Garmin VO2Max readings being accurate to 1-2 points different to a real world test.
    There is this constant debate about how accurately VO2max is measured and its highly dependent on actually doing exercise to determine your VO2max using your watch. But yes if you want a lab/medically precise measure you need to do it a test that measures your actual oxygen uptake.
cthalupa11 days ago
I'll preface this with I generally trust doctors. I think on the whole they are well positioned to provide massive benefit to their patients.
I will also preface this with saying I do not think any LLM is better than the average doctor and that you are far better served going to your doctor than asking ChatGPT what your health is like on any factor.
But I'll also say that the quality of doctors varies massively, and that a good amount of doctors learn what they learn in school and do not keep up with the latest advances in research, particularly those that have broad spectrums such as GPs. LLMs that search scientific literature, etc., might point you in the direction of this research that the doctors are not aware of. Or hallucinate you into having some random disease that impacts 3 out of every million people and send you down a rabbithole for months.
Unfortunately, it's difficult to resolve this without extremely good insurance or money to burn. The depth you get and the level of information that a good preventative care cardiologist has is just miles ahead of where your average family medicine practitioner is at. Statins are an excellent example - new prescriptions are for atorvastatin are still insanely high despite it being a fairly poor choice in comparison to rosuvastatin or pitavastatin for a good chunk of the people on it. They often are behind on the latest recommendations from the NLA and AHA, etc.
There's a world where LLMs or similar can empower everyday people to talk to their doctor about their options and where they stand on health, where they don't have to hope their doc is familiar with where the science has shifted over the past 5-10 years, or cough up the money for someone who specializes in it. But that's not the world of today.
In the mean time, I do think people should be comfortable being their own advocates with their doctors. I'm lucky enough that my primary care doc is open to reading the studies I send over to him on things and work with me. Or at least patient enough to humor me. But it's let me get on medications that treat my symptoms without side effects and improved my quality of life (and hopefully life/healthspan). There's also been things I've misinterpreted - I don't pick a fight with him if we come to opposite conclusions. He's shown good faith in agreeing with me where it makes sense to me, and pushed back where it hasn't, and I acknowledge he's the expert.
- port1111 days ago
  I interviewed for Ada, whose ML diagnostic tool had shown itself more accurate at diagnosis than a panel of doctors. It was specifically trained on case data, IIRC, and doctors were paid to help improve the results.
  I wonder what it’s like now. Any time I use it for a diagnosis I get outlandish results, and then I’ll head to my GP and turns out it was something rather simple.
- biophysboy11 days ago
  I think the fairest test is: what is the best and fastest way to reduce medical uncertainty? For rare ailments with a single cause and exclusive symptoms, that can be accurately described with simple language (no medical jargon), its possible that an LLM is better than a doctor.
  For more ambiguous situations where you need actual tests, I am skeptical of using LLMs.
cameldrv12 days ago
I dunno, if the Apple Watch said he had a vo2max of 30, that probably means he can’t run a mile in less than 12 minutes or so. He’s probably not at all healthy…
- smcl12 days ago
  Apple Watch is pretty poor at estimating VO2 max and it seems to be more correlated with how often you record exercises with said watch than with your actual health. For example I watched mine climb slowly as I prepared for my football season (beyond 50), then after the season started I I ended up playing and training just as frequently but without wearing the watch. After a few weeks (of me training and playing hard) during my next run it recorded me having a sharp decline in VO2 max (43-44ish iirc). When I started wearing it during training - you're not permitted during matches - it recorded me having a slow return to condition, without any changes to my routine.
  That said if it's showing someone as having 30 I don't imagine they're going to be in spectacular condition
  - port1111 days ago
    I really don’t know whether to trust that specific measurement. When I was a very active runner and doing intervals to improve per-km time, my VO2max went from 38 to 42. I decided to do a professional VO2max test and got a 46.
    Now, 2 years later, I don’t run due to injury and a kid, and it’s resting at 34. For reference, when I went to the gym almost everyday and ran once or twice a week, the value was 32.
    I don’t get much utility out of it, even looking at the trends. Not sure what Apple is doing behind the scenes to get the score.
    smcl11 days ago
    Yeah so I know it's meant to be an estimate, but my experience of it is kinda fucky. I would really love to swap watches with an Olympic athlete (idk if they'd bother with an Apple Watch but bear with me!) and run 10k to see what the VO2 max reading for that exercise was. As I said, I think to me it's some estimate that heavily involves some "average of last N readings from the Apple VO2 max calc" function so even if you time travelled and gave it to Eilish McColgan or Mo Farah they'd be like "ehhh you had quite a good run, fatty - you jumped from 44.3 to 45"
    I'm not that bothered of course. For me it's just a fun metric I can attempt to optimise when training.
    port1110 days ago
    That experiment might be unfruitful because I assume Apple’s algorithm was not trained on outliers. Very capable athletes might see similarly silly data because they don’t fit well into the bell curve. Maybe.
  - eleveriven12 days ago
    This is really more of an "utdoor run while wearing the watch" proxy than a true fitness measure
- Someone12 days ago
  > he had a vo2max of 30, that probably means he can’t run a mile in less than 12 minutes or so. He’s probably not at all healthy…
  Health and fitness correlate but are different things. VO2max is more about fitness than about health.
  Also, looking at https://en.wikipedia.org/wiki/VO2_max#Reference_values, 30 is about average for men in their 40s/50s, which, form a quick google, I estimate is the author’s age range.
  - FeteCommuniste12 days ago
    > Also, looking at https://en.wikipedia.org/wiki/VO2_max#Reference_values, 30 is about average for men in their 40s/50s, which, form a quick google, I estimate is the author’s age range.
    And the average man is his 40s or 50s is in...not especially good aerobic shape.
  - netdevphoenix12 days ago
    Fitness correlates with health though. Just because you don't have any conditions does not mean that you are healthy. And inability to meet certain fitness tests is correlated with lower health.
  - danielmarkbruce11 days ago
    This is a silly take. VO2 max is one of the strongest predictors of all cause mortality. Various large scale studies have shown it to be true.
- akshivb11 days ago
  I had a "below average" VO2 max score based on my Apple Watch measurements. It was ~40 mL/kg/min, in the span of about a month it jumped up to 53 mL/kg/min, which is "high" for my age group. So what happened? I started running instead of cycling as my primary form of cardio.
  My hypothesis is that the apple watch estimates higher if you are running rather than pedaling. I definitely don't think my cardio vascular went from poor to great over a month. It seems more likely that it was maybe underestimating, and perhaps now is overestimating.
  - mdtancsa11 days ago
    After a long injury, I got back to slowly running on the treadmill/bike/elliptical at the gym. IIRC, my garmin qualified its VO2Max results by saying I needed to run out side for some period of time to get a more accurate measurement. I guess there is something about the running metrics it collects that has a smaller error range.
  - wincy11 days ago
    Yeah I just ignore it, when I was biking 40+ miles a week this summer it says my VO2 max was 18, which is just absurd. Maybe because my arm is really hairy I don’t know.
- dgxyz12 days ago
  If Apple watch said anything about that it's probably wrong. It can't accurately measure VO2 max.
  Incidentally I got rid of mine recently. It is bliss not having one.
  Also VO2 max is a crappy measure of fitness. I apparently had "average" VO2 max after a treadmill test. I can hike 50km with a 2km elevation gain in one go and not die. People with higher VO2 max I know, dropped out.
  - evandijk7012 days ago
    During a 50 km hike you are not anywhere close to your VO2 max, so it makes sense that the VO2 max is not predictive for that distance.
  - lurking_swe11 days ago
    You’re not wrong. However - the Health app on the iphone (where you can view your health data) makes this VERY clear. Most people just don’t read.
    I’ll quote:
    “This is a measurement of your VO2 max, which is the maximum amount of oxygen your body can consume during exercise. Also called cardiorespiratory fitness, this is a useful measurement for everyone from the very fit to those managing illness.
    A higher VO, max indicates a higher level of cardio fitness and endurance.
    Measuring VO2 max requires a physical test and special equipment. You can also get an estimated VO, max with heart rate and motion data from a fitness tracker. Apple Watch can record an estimated VO max when you do a brisk walk, hike, or run outdoors.
    VO, max is classified for users 20 and older. Most people can improve their VO, max with more intense and more frequent cardiovascular exercise. Certain conditions or medications that limit your heart rate may cause an overestimation of your VO, max. VOz max is not validated for pregnant users. You can indicate you're taking certain medications or add a current pregnancy in Health Details.”
  - bwv84812 days ago
    > hike 50km with a 2km elevation gain in one go and not die.
    And thru-hikers can do this for days. It’s more related to fatigue resistance, mitochondrial density, and walking efficiency. But VO2 max still matters in high-intensity sports, you can’t ignore it when you’re pedaling a bike at high Zone 4 in a race.
  - danielmarkbruce11 days ago
    vo2 max is one of the strongest predictors of all cause mortality.
- mr_toad11 days ago
  Compared to the average patient a typical GP sees, someone who can actually run a mile is probably doing pretty well.
  - smt8811 days ago
    This is certainly true in the US, but I don't think it's universal at all
sinuhe6912 days ago
My general take on any AI/ML in medicine is that without a proper clinical validation, they are not worth to try. Also, AI Snake Oil is worth reading.
- rubatuga12 days ago
  Clinical validation, proper calibration, ethnic and community and population variants, questioning technique and more ...
- joelthelion12 days ago
  Exactly. There's a lot of potential, but it needs to be done right, otherwise it is worse than useless.
seemaze12 days ago
I can't wait until it starts recommending signing me up for an OpenAI personalized multi-vitamin® supscription
- meindnoch12 days ago
  "You're absolutely right! I was mistaken about mercury and lead being essential minerals, and adding them to your supplements. Sorry about that!"
  - seemaze9 days ago
    I was casually browsing for a heath monitor when I came across Ultrahuman Blood Vision - one of the key features being AI Clinical Summary with Supplement Report.. it seems my sarcasm was late to the party.
elzbardico12 days ago
LLMs are not a mythical universal machine learning model that you can feed any input and have it magically do the same thing a specialized ML model could do.
You can't feed an LLM years of time-series meteorological data, and expect it to work as a specialized weather model, you can't feed it years of medical time-series and expect it to work as a model specifically trained, and validated on this specific kind of data.
An LLM generates a stream of tokens. You feed it a giant set of CSVs, if it was not RL'd to do something useful with it, it will just try to make whatever sense of it and generate something that will most probably have no strong numerical relationship to your data, it will simulate an analysis, it won't do it.
You may have a giant context windows, but attention is sparse, the attention mechanism doesn't see your whole data at the same time, it can do some simple comparisons, like figuring out that if I say my current pressure is 210X180 I should call an ER immediately. But once I send it a time-series of my twice a day blood-pressure measurements for the last 10 years, it can't make any real sense of it.
Indeed, it would have been better for the author to ask the LLM to generate a python notebook to do some data analysis on it, and then run the notebook and share the result with the doctor.
- rfw30012 days ago
  This is true as a technical matter, but this isn't a technical blog post! It's a consumer review, and when companies ship consumer products, the people who use them can't be expected to understand failure modes that are not clearly communicated to them. If OpenAI wants regular people to dump their data into ChatGPT for Health, the onus is on them to make it reliable.
  - themafia12 days ago
    > the onus is on them to make it reliable.
    That is not a plausible outcome given the current technology or of any of OpenAI's demonstrated capabilities.
    "If Bob's Hacksaw Surgery Center wants to stay in business they have to stop killing patients!"
    Perhaps we should just stop him before it goes too far?
    vineyardmike12 days ago
    > That is not a plausible outcome given the current technology or of any of OpenAI's demonstrated capabilities
    OpenAI has said that medical advice was one of their biggest use-cases they saw from users. It should be assumed they're investigating how to build out this product capability.
    Google has LLMs fine tuned on medical data. I have a friend who works at a top-tier US medical research university, and the university is regularly working with ML research labs to generate doctor-annotated training data. OpenAI absolutely could be involved in creating such a product using this sort of source.
    You can feed an LLM text, pictures, videos, audio, etc - why not train a model to accept medical-time-series data as another modality? Obviously this could have a negative performance impact on a coding model, but could potentially be valuable for a consumer-oriented chat bot. Or, of course, they could create a dedicated model and tool-call that model.
    elzbardico12 days ago
    They are going to do the same thing they do with code.
    They are going to hire armies of developing world workers to massage those models on post-training to have some acceptable behaviors, and they will create the appropriate agents with the appropriate tools to have something that will simulate the real thing in a most plausible way.
    Problem is, RLVR is cheap with code, but it can get very expensive with human physiology.
- Deklomalo11 days ago
  You state a lot of things without testing it first?
  A LLM has structures in its latent space which allows it to do basic math, it has also seen enough data that it has probably structures in it to detect basic trends.
  A LLM doesn't just generate a stream of tokens. It generates an embedding and searches/does something in its latent space, then returns tokens.
  And you don't even know at all what LLM Interfaces do in the background. Gemini creates sub-agents. There can easily be already a 'trend detector'.
  I even did a test and generated random data with a trend and fet it to chatgpt. The output was very coherent and right.
  - elzbardico11 days ago
    That's not how it works.
    Deklomalo11 days ago
    What exactly?
    Here is the paper were I read about it: https://arxiv.org/html/2601.04480v1
- protocolture12 days ago
  This LLM is advertising itself in a medical capacity. You arent wrong, but the customer has been fed the wrong set of expectations. Its the fault of the marketing of the tool.
tiffanyh11 days ago
What's the feedback loop here for ChatGPT?
For it to get better, it needs to know outcomes of its diagnosis.
Are people just typing back to ChatGPT saying "you're wrong / you're right"?
francisofascii11 days ago
> There were big swings in my resting heart rate whenever I got a new Apple Watch, suggesting the devices may not have been tracking the same way.
First of all, wrist based HR measurements are not reliable. If you feed ChatGPT a ton of HR data that is just plain wrong, expect a bad result. Everyone who wants to track HR reliably should invest in a chest strap. The VO2 Max calculation is heavily based on your pace at a given heart rate. It makes some generalizations on on your running biomechanics. For example, if your "real" lab tested VO2 max stays constant, but you improve your biomechanics / running efficiency, you can run faster at the same effort, and your Apple watch will increase your VO2 Max number.
- AlanYx11 days ago
  In this case the article's guess is probably accurate. Apple did change how they measure RHR in WatchOS 11.2. If the author was using an Apple Watch that doesn't support 11.2 and then switched to one that does, a swing was very likely.
siliconc0w12 days ago
The problem is that false positives can be incredibly expensive in money, time, pain, and anxiety. Most people cannot afford (and healthcare system cannot handle) thousands of dollars in tests to disprove every AI hunch. And tests are rarely consequence free. This is effectively a negative externality of these AI health products and society is picking up the tab.
- mr_toad11 days ago
  This is why certain types of cancer tests are usually only performed on people over a certain age. If you test young people the false positives outnumber the true positives.
alpineman12 days ago
My wife is a doctor and there is a general trend at the moment of everyone thinking their intelligence in one area (say programming) carries over into other areas such as medicine, particularly with new tools such as ChatGPT.
Imagine if as a dev someone came to you and told you everything that is wrong with your tech stack because they copy pasted some console errors into ChatGPT. There's a reason doctors need to spend almost a decade in training to parse this kind of info. If you do the above then please do it with respect for their profession.
- FeteCommuniste12 days ago
  > My wife is a doctor and there is a general trend at the moment of everyone thinking their intelligence in one area (say programming) carries over into other areas such as medicine, particularly with new tools such as ChatGPT.
  My wife is a lawyer and sees the same thing at her job. People "writing" briefs or doing legal "research" with GPT and then insisting that their document must be right because the magic AI box produced it.
- tripledry12 days ago
  I'm reminded of an effect called Gell-Mann Amnesia.
  When reading news stories on topics you know well, you notice inaccuracies or poor reporting - but then immediately forget that lesson when reading the next article on a topic you are not familiar with.
  It's very similar to what happens with AI.
- mr_toad11 days ago
  > general trend at the moment
  “A little knowledge is a dangerous thing” is not new, it’s a quote/observation that goes back hundreds of years.
  > Imagine if as a dev someone came to you and told you everything that is wrong with your tech stack because they copy pasted some console errors into ChatGPT.
  You mean the PHB? They don’t need ChatGPT for that, they can cite Gartner.
dfajgljsldkjag12 days ago
The author is a healthy person but the computer program still gave him a failing grade of F. It is irresponsible for these companies to release broken tools that can cause so much fear in real people. They are treating serious medical advice like it is just a video game or a toy. Real users should not be the ones testing these dangerous products.
- nomel12 days ago
  > It is irresponsible for these companies
  I would claim that ignoring the "ChatGPT is AI and can make mistakes. Check important info." text, right under the query they type in client, is clearly more irresponsible.
  I think that a disclaimer like that is the most useful and reasonable approach for AI.
  "Here's a tool, and it's sometimes wrong." means the public can have access to LLMs and AI. The alternative, that you seem to be suggesting (correct me if I'm wrong), means the public can't have access to an LLM until they are near perfect, which means the public can't ever have access to an LLM, or any AI.
  What do you see as a reasonable approach to letting the public access these imperfect models? Training? Popups/agreement after every question "I understand this might be BS"? What's the threshold for quality of information where it's no longer considered "broken"? Is that threshold as good as or better than humans/news orgs/doctors/etc?
  - tomgp11 days ago
    The issue is that whilst the warning exists and is there front and centre, the marketing around ChatGPT etc - which is absolutely deafening in volume and enthusiasm - is that they're PHD level experts and can do anything.
    This marketing obscures what the software is _actually_ good at and gives users a poor mental model of what's going on under the hood. Dumping years worth of un-differentiated health data into a generic chatGPT chat window seems like a fundamental misunderstanding of the strengths of large language models.
    A reasonable approach would be to try to explain what kind of tasks these models do well at and what kind of situations they behave poorly in.
  - ytoawwhra9212 days ago
    Why are you assuming that the general public ought to have access to imperfect tools?
    I live in a place where getting a blood test requires a referral from a doctor, who is also required to discuss the results with you.
    nomel12 days ago
    > Why are you assuming that the general public ought to have access to imperfect tools?
    Could you tell me which source of information do you see as "perfect" (or acceptable) that you see as a good example of a threshold for what you think the public should and should not have access to?
    Also, what if a tool still provides value to the user, in some contexts, but not to others, in different contexts (for example, using the tool wrong)?
    For the "tool" perspective, I've personal never seen a perfect tool. Do you have an example?
    > I live in a place where getting a blood test requires a referral from a doctor, who is also required to discuss the results with you.
    I don't see how this is relevant. In the above article, the user went to their doctor for advice and a referral. But, in the US (and, many European countries) blood tests aren't restricted, and can be had from private labs out of pocket, since they're just measurements of things that exist in your blood, and not allowing you to know what's inside of you would be considered government overreach/privacy violation. Medical interpretations/advice from the measurements is what's restricted, in most places.
    ytoawwhra9212 days ago
    > Could you tell me which source of information do you see as "perfect" (or acceptable) that you see as a good example of a threshold for what you think the public should and should not have access to?
    I know it when I see it.
    > I don't see how this is relevant.
    It's relevant because blood testing is an imperfect tool. Laypeople lack the knowledge/experience to identify imperfections and are likely to take results at face value. Like the author of the article did when ChatGPT gave them an F for their cardiac health.
    > Medical interpretations/advice from the measurements is what's restricted, in most places.
    Do you agree with that restriction?
    nomel12 days ago
    > I know it when I see it.
    This isn't a reasonable answer. No action can be taken and no conclusion/thought can be made from it.
    > Do you agree with that restriction?
    People should be able to perform and be informed about their own blood measurements, and possibly bring something up with their doctors outside of routine exams (which they may not even be insured for in the US). I think the restriction on medical advice/conclusion, that results in treatment, is very good, otherwise you end up with "Wow, look at these results! you'll have to buy my snake oil or you'll die!".
    I don't believe in reducing society to a level that completely protects the most stupid of us.
    ytoawwhra9212 days ago
    > This isn't a reasonable answer.
    Sure it is. The world runs on human judgement. If you want me to rephrase I could say that the threshold for imperfection should reflect contemporary community standards, but Stewart's words are catchier.
    > I think the restriction on medical advice/conclusion, that results in treatment, is very good, otherwise you end up with "Wow, look at these results! you'll have to buy my snake oil or you'll die!".
    Some people would describe this as an infringement on their free speech and bodily autonomy.
    Which is to say that I think you and I agree that people in general need the government to apply some degree of restriction to medicine, we just disagree about where the line is.
    But I think if I asked you to describe to me exactly where the line is you'd ultimately end up at some incarnation of "I know it when I see it".
    Which is fine. Even good, I think.
    > I don't believe in reducing society to a level that completely protects the most stupid of us.
    This seems at odds with what you said above. A non-stupid person would seek multiple consistent opinions before accepting medical treatment, after all.
    nomel12 days ago
    > I know it when I see it.
    What's the most complex (in an information rich way) tool that you have seen?
    cthalupa11 days ago
    > I live in a place where getting a blood test requires a referral from a doctor,
    To me, this is horrific. I am the advocate for my own health. I trust my doctor - he's a great guy. I have spoken to him extensively around a variety of health matters and I greatly trust his opinion.
    But I also recognize that he has many other patients and by necessity has to work within the general lines of probability. There is no way for him to know every confounding and contributing factor of my health, no matter how diligent I am in filling out my chart.
    I get my own bloodwork done regularly. This has let me make significant changes in my life to improve health markers. I can also get a much broader spectrum of tests done than the standard panel. This has directly lead to productive conversations with my doctor!
    And from a more philosophical standpoint, this is about understanding my own body. The source of the data is me. Why should this be gatekept behind a physician referral? I find it insane to think that I could be in a position where I am not allowed to find out the cholesterol serum levels in my blood unless a doctor OKs it! What the fuck?
    kolinko12 days ago
    > I live in a place where getting a blood test requires a referral from a doctor, who is also required to discuss the results with you.
    You’re saying it like it’s a good thing.
  - coffeefirst12 days ago
    Oh I have a plan for this.
    Allow it to answer general questions about health, medicine and science.
    It can’t practice medicine, it can only be a talking encyclopedia that tells you how the heart works and how certain biomarkers are used. Analyzing your specific case or data is off limits.
    And then when the author asks his question, it says it’s not designed to do that.
  - throwaway29012 days ago
    > "ChatGPT is AI and can make mistakes. Check important info."
    Is the same thing that can be said about any human
    > "Doctor is human and can make mistakes"
    Therefore it's really not sufficient to make it clear that it is wrong in different ways and worse than human.
  - zdragnar12 days ago
    > Popups/agreement after every question "I understand this might be BS"?
    Considering the number of people who take LLM responses as authoritative Truth, that wouldn't be the worst thing in the world.
  - anon700012 days ago
    The problem is that AI companies are selling, advertising, and shipping AI as a tool that works most of the time for what you ask it to do. That’s deeply misleading.
    The product itself is telling you in plain English that it’s ABSOLUTELY CERTAIN about its answer… even when you challenge it and try to rebut it. And the text of the product itself is much more prominent than the little asterisk “oh no, it’s actually lying because the LLM can never be that certain.” That’s clearly not a responsible product.
    I opened the ChatGPT app right now and there is literally nothing about double checking results. It just says “ask anything,” in no uncertain terms, with no fine print.
    Here’s a recent ad from OpenAI: https://youtu.be/uZ_BMwB647A, and I quote “Using ChatGPT allowed us to really feel like we have the facts and our doctor is giving us his expertise, his experience, his gut instinct” related to a severe health question.
    And another recent ad related to analyzing medical scans: “What’s wonderful about ChatGPT is that it can be that cumulative source of information, so that we can make the best choices.” (https://youtu.be/rXuKh4e6gw4)
    And yet another recent ad, where lots of users are using ChatGPT to get authoritative answers to health questions. They even say you can take a picture of a meal before you eat and after you eat, and have it generate the amount of calories you ate! Just based on the difference between the pictures! How has that been tested and verified? (https://youtu.be/305lqu-fmbg)
    Now, some of the ads have users talking to their doctors, which is great.
    But they are clearly marketing ChatGPT as the tool to use if you want to arrive at the truth. No asterisks. No “but sometimes it’s wrong and you won’t be able to tell.” There’s nothing to misunderstand about these ads: OpenAI is telling you that ChatGPT is trustworthy.
    So I reject the premise that it’s the user’s fault for not using enough caution with these tools. OpenAI is practically begging you to jump in and use it for personal, life or death type decisions, and does very little to help you understand when it may be wrong.
- dylan60412 days ago
  What LLM should the LLM turn to ask if what the user is asking is safe for the first LLM to answer?
- elzbardico12 days ago
  Well, what we could expect? It is a fucking Large Language Model. You're feeding it a very long multi-variable time series, it can't make any sense of it, but it is going to generate text.
  If you are lucky, maybe it was finetuned to see a long comma-delimited sequence of values as a table and then emit a series of tool calls to generate some deterministic code to calculate a set of descriptive statistics that then will be close in the latent space to some hopefully current medical literature, and it will generate some things that makes sense and it is not absurdly wrong.
  It is a fucking LLM, it is not 2001's HAL.
- eleveriven12 days ago
  And real users shouldn't be the ones discovering these edge cases through fear
brandonb12 days ago
We trained a foundation model specifically for wearable data: https://www.empirical.health/blog/wearable-foundation-model-...
The basic idea was to adapt JEPA (Yann LeCun's Joint-Embedding Predictive Architecture) to multivariate time series, in order to learn a latent space of human health from purely unlabeled data. Then, we tested the model using supervised fine tuning and evaluation on on a bunch of downstream tasks, such as predicting a diagnosis of hypertension (~87% accuracy). In theory, this model could be also aligned to the latent space of an LLM--similar to how CLIP aligns a vision model to an LLM.
IMO, this shows that accuracy in consumer health will require specialized models alongside standard LLMs.
daft_pink12 days ago
the problem with ai is that it isn’t good at recognizing red flags in data. i used it to find red flags in a financial report and it finds red flags in virtually every financial report it lays eyes on.
uriegas11 days ago
There are some research projects out there that use LLMs for health diagnostics. Here's one: https://cs.stanford.edu/people/jure/pubs/med-pmlr23.pdf
They usually require more data It is not a great idea to diagnose anything with so few information. But in general I am optimistic of the use of LLMs on health.
gizajob12 days ago
Hard to tell who is stupider, the writer or ChatGPT.
- astura12 days ago
  The writer makes money from pumping out shitty click bait articles.
  - FeteCommuniste11 days ago
    I don't see how it's "shitty." It portrays a usage of ChatGPT that I imagine is becoming pretty typical. People are treating "AI" as an oracle. The situation isn't helped by corporate heads and LLM boosters blathering on about how AI is soon going to replace most of the workforce, boost productivity by a gazillion percent, and cure cancer.
jablongo11 days ago
There needs to be more documentation about what info was provided to the LLM and in which format before we decide that LLMs are necessarily bad at this. That said, you would expect the offering from a $500bn company to be more robust and better tested than this, assuming this is reported accurately.
hasbot11 days ago
Hmm, sure it's maybe wrong now, but in several years, it could be correct. So maybe I should wear a device now so when it does become correct and I'm even older, AI might be useful.
I'm definitely not going with Apple. Are there any minimally obtrusive trackers that provide downloadable data?
spicyusername11 days ago
So we're feeding bad data into a system known for making answers up and expecting... what exactly, lol
evolighting12 days ago
Health data, medical records, even research data, is very scarce in the public domain. This is not just due to so-called privacy concerns, but because such data could have generated “value” (and been sold at a good price) long before the emergence of large language models.
- ThundeChile12 days ago
  I think it's quite alarming that people don't even think about the privacy when sending their health data to corporations which make a large percentage of their revenue selling the data onwards (or using it to things you didn't mean them to).
ge9611 days ago
I think I found the proper sleep amount for me (sleep deprived). It has me feeling agitated/motivated. It's around 5 hours. What I need is something like Apple Watch to detect when I've actually fallen asleep then set the alarm for that long.
rurban11 days ago
I use the free Huawei Health for like 2 years, and it was pretty good so far. The sensors suck of course, but better than nothing. I had a special watch to test my high blood pressure, but even this never matched my special pressure device.
zombot11 days ago
Giving your health data to an AI is sick. Unfortunately no doctor can cure you of that.
zhisme11 days ago
Check out iatrogenesis. There's no need to rely on apple watch data to become some drug addicted guy curing never existed diseases. That's not the metric you want to define whether you need meds and medical help at all.
Aachen12 days ago
> I let ChatGPT analyze a decade of my Apple Watch data, then I called my doctor
... and you won't believe what happened next!
Can we do away with the clickbait from MSN? The article is about LLMs misdiagnosing cardiovascular status when given fitness tracker data
- g947o12 days ago
  I have www.msn.com added to the blocklist of my router's adblock settings so that I can stop seeing nonsense in Skype/Windows. It worked for a while.
  Of course, the real solution is to stop using Microsoft products, which I did.
  - Aachen11 days ago
    Frankly the main thing I take issue with is dumping these garbage titles onto HN. They can do what they want on their site
djoldman12 days ago
I'm less interested in what "grade" the AI gave and much more interested in what therapy or remedy it would have suggested. That's curiously lacking here.
danielmarkbruce11 days ago
vo2 max is one of the strongest predictors of all cause mortality. It's been reproduced across several large scale studies. I'm on the side of ChatGPT on this one. I'd guess the writer of this article is leaving something out.
A family member recently passed away from a rare, clinically diagnosed disease. ChatGPT knew what it was a couple months before the relevant specialists diagnosed it.
jdub12 days ago
Why do people even begin to believe that a large language model can usefully understand and interpret health data?
Sure, LLM companies and proponents bear responsibility for the positioning of LLM tools, and particularly their presentation as chat bots.
But from a systems point of view, it's hard to ignore the inequity and inconvenience of the US health system driving people to unrealistic alternatives.
(I wonder if anyone's gathering comparable stats on "Doctor LLM" interactions in different countries... there were some interesting ones that showed how "Doctor Google" was more of a problem in the US than elsewhere.)
gizmodo5912 days ago
For every sensational article of AI was useless, there is plenty of examples where using ChatGPT to find out what else could be happening and then having a conversation with doctor has helped many that I know of anecdotally and many such reports online as well.
At the end of the day, it’s yet another tool that people can use to help their lives. They have to use their brain. The culture of seeing doctor as a god doesn’t hold up anymore. So many people have had bad experiences when the entire health care industry at least in US is primarily a business than helping society get healthy.
CqtGLRGcukpy12 days ago
Original article can be read at https://www.washingtonpost.com/technology/2026/01/26/chatgpt....
Paywall-free version at https://archive.ph/k4Rxt
stego-tech12 days ago
This is not remotely surprising.
Look, AI Healthbros, I'll tell you quite clearly what I want from your statistical pattern analyzers, and you don't even have to pay me for the idea (though I wouldn't say no to a home or Enterprise IT gig at your startup):
I want an AI/ML tool to not merely analyze my medical info (ON DEVICE, no cloud sharing kthx), but also extrapolate patterns involving weather, location, screen time, and other "non-health" data.
Do I record taking tylenol when the barometric pressure drops? Start alerting me ahead of time so I can try to avoid a headache.
Does my screen time correlate to immediately decreased sleep scores? Send me a push notification or webhook I can act upon/script off of, like locking me out of my device for the night or dimming my lights.
Am I recording higher-intensity workouts in colder temperatures or inclement weather? Start tracking those metrics and maybe keep better track of balance readings during those events for improved mobility issue detection.
Got an app where I track cannabis use or alcohol consumption? Tie that to my mental health journal or biological readings to identify red flags or concerns about misuse.
Stop trying to replace people like my medical care team, and instead equip them with better insights and datasets they can more quickly act upon. "Subject has been reporting more negative moods in his mental health journal, an uptick in alcohol consumption above his baseline, and inconsistent cannabis use compared to prior patterns" equips the care team with a quick, verifiable blurb from larger datasets that can accelerate care and improve patient outcomes - without the hallucinations of generative AI.
6stringmerc11 days ago
In my view the people, no matter walk of life or education level or societal class, who ask “AI” systems mental or physical health questions are modern day incarnations of customers who went to palm readers, tarot card sessions, or used to rely on televangelists for hope and meaning.
I strongly dislike the author conflating HIPAA with PHI but this is a losing battle for me. And clearly editors don’t spot it, neither do AI systems - where is Clippy?! It simply serves as an indicator the author is a pretty ignorant medical consumer in the US, and this case study is stunning. Some people really should not be allowed to engage with magic.
blef11 days ago
The title would have been event better if: "I had ChatGPT analyze a decade of my Apple Watch data, then it called my doctor"
eleveriven12 days ago
Right now this looks less like "AI for healthcare" and more like a very polished way to scare (or falsely reassure) people
anonzzzies12 days ago
Apple watch told me, based on vo2 max, that i'm almost dead, all the time. I went to the doctor, did a real test and it was complete nonsense. I had the watch replaced 3 times but same results, so I returned it and will not try again. Scaring people with stuff you cannot actually shut off (at least you couldn't before) is not great.
elzbardico12 days ago
A simple understanding of transformers should be enough to make someone see that using an LLM to analyze multi-variate time series data is a really stupid endeavor.
- nprateem12 days ago
  It should be obvious to even the most dim-witted idiot with a PhD in statistics and AI
  - elzbardico12 days ago
    You only need this if you are a researcher. Undergraduate knowledge of Calculus and Linear Algebra is more than enough to have quite a good understanding of ML in general, and LLMs in particular.
    Maybe a very small bit of Information Theory (a couple of Shannon's papers are enough) and some classical books on Natural Language Processing from the late 90s and early 2000 so you have an idea of what Language Models are outside the modern Deep Learning driven approach.
creatonez12 days ago
ChatGPT Health is a completely wreckless and dangerous product, they should be sued into oblivion for even naming it "health".
- orionsbelt12 days ago
  ChatGPT has done more for my health than any doctor. Truly.
  - 12 days ago
    undefined
  - haldujai12 days ago
    How so?
    theshrike7912 days ago
    ChatGPT will actually look at your whole medical history, listen to you, think and check multiple different options before making a decision. You can spend hours chatting with it back and forth.
    An average human doctor has maybe 15 minutes allotted to getting to know you, analyse and determine a course of action. Which is usually "take some ibuprofen and let's see if it goes away". Then you go again in two weeks with the same thing, it's a different doctor and the context has been reset unless you do an info dump from the previous visits and try not to forget anything.
    And if you infodump too much or use actual medical diagnosis terms, the Dr gets defensive because you're stepping on THEIR area of expertise and will start pushing back even from the obvious just because they can.
    haldujai10 days ago
    I wonder if in your case (which is very common) the issue is a mismatch between expectations and reality. The medical system as we know is not designed for someone to listen to you and do a back and forth for hours. If we did that we would only treat 2-4 patients a day. It’s also not particularly helpful.
    Time spent in a medical encounter is tied to patient satisfaction but there is rapid drop off for clinical benefit especially in the current day where investigations are more important than a physical exam in most cases and more than history in a substantial portion.
    15 minutes is what we book as follow-ups or minor assessments in US+Canada, usually sufficient for most things. New consults or complex patients are 30-60 minutes.
    Infodumping is not particularly helpful. Doctors are trained to use a combination of open and closed questions to guide the encounter based on their thinking and understanding of medicine. It’s relevant past medical history as not every symptom or past disease is necessarily useful in assessing what’s wrong today.
    eur0pa11 days ago
    A LLM neither "listens" nor "thinks"
    theshrike7911 days ago
    For the sake of fluid writing, I did use anthropomorphic verbs.
    What would you prefer instead?
    creatonez11 days ago
    > has maybe 15 minutes allotted to getting to know you [...] Then you go again in two weeks with the same thing, it's a different doctor and the context has been reset
    This is not how doctors work in most of the world. Not having an actual primary care physician that is able to keep track of each patient over multiple years means they are skipping out on one of their most important duties. You should advocate for a better standard of care rather than resorting to hallucinating chatbots.
    theshrike7911 days ago
    All of the country of Finland works like that.
    Nobody sees the same doctor twice except in very rare cases - usually when the doctor is a specialist with no alternative
brewcejener11 days ago
[dead]
Barathkanna12 days ago
TLDR: AI didn’t diagnose anything, it turned years of messy health data into clear trends. That helped the author ask better questions and have a more useful conversation with their doctor, which is the real value here.
maxdo12 days ago
Typical Western coverage: “How dare they call me unhealthy.” In reality, the doctor said it needs further investigation and that some data isn’t great. They didn’t say “unhealthy”; they said “needs more investigation.” What’s wrong with that? Is the real issue just a bruised Western ego?
- smileysteve11 days ago
  Alt; typical western coverage. Has completely ignored other journalists publishing of the plight fitness bands have caused in doctors "am I getting a cold, my watch/ring says I'm getting a cold, give me antibiotics now"
  Or how vo2 max is hard to measure, or how not wearing a wearable or wearing it loose changes results, to, I gave an llm a range to rate without really giving it context of what I want the range to really represent or the methods of gathering data.
  Tldr; author bought everything, read nothing, complained to an expensive professional, and now hopes that we read his article.