My thousand dollar iPhone can't do math(journal.rafaelcosta.me)

176 pointsby rafaelcosta7 hours ago19 comments

csmantle4 hours ago
Methodology is one thing; I can't really agree that deploying an LLM to do sums is great. Almost as hilarious as asking "What's moon plus sun?"
But phenomenon is another thing. Apple's numerical APIs are producing inconsistent results on a minority of devices. This is something worth Apple's attention.
- JimboOmega2 hours ago
  (This is a total digression, so apologies)
  My mind instantly answered that with "bright", which is what you get when you combine the sun and moon radicals to make 明(https://en.wiktionary.org/wiki/%E6%98%8E)
  Anyway, that question is not without reasonable answers. "Full Moon" might make sense too. No obvious deterministic answer, though, naturally.
  - awesome_dude32 minutes ago
    FTR the Full Moon was exactly 5 hours ago (It's not without humour that this conversation occurs on the day of the full moon :)
- CrispinS2 hours ago
  > What's moon plus sun?
  Eclipse, obviously.
  - christophilus2 hours ago
    That’s sun minus moon. Moon plus sun is a wildly more massive, nuclear furnace of a moon that also engulfs the earth.
    mcny34 minutes ago
    Wait so moon plus sun != sun plus moon? :Thinking:
    dcrazyan hour ago
    This thread reminds me of Scribblenauts, the game where you conjure objects to solve puzzles by describing them. I suspect it was an inspiration for Baba Is You.
    Der_Einzigean hour ago
    Scribblenauts was also an early precursor to modern GenAI/word embeddings. I constantly bring it up in discussions of the history of AI for this reason.
  - geuisan hour ago
    Not obvious. Astronomers are actively looking for signatures of exomoons around exoplanets. So "sun plus moon" could mean that too.
    xatttan hour ago
    The OP said moon + sun, rather than sun + moon. We have no idea yet if celestial math is non-communicative.
- fatheranton3 hours ago
  [dead]
DustinEchoes3 hours ago
I wish he would have tried on a different iPhone 16 Pro Max to see if the defect was specific to that individual device.
- crossroadsguy2 hours ago
  So true! And as any sane Apple user or the standard template Apple Support person would have suggested (and as they actually suggest) - did they try reinstalling the OS from scratch after having reset the data (of course before backing it up; preferably with a hefty iCloud+ plan)? Because that's the thing to do in such issues and it's very easy.
Buttons8405 hours ago
I clicked hoping this would be about how old graphing calculators are generally better math companions than a phone.
The best way to do math on my phone I know of is the HP Prime emulator.
- xp842 hours ago
  I was pretty delighted to realize I could now delete the lame Calculator.app from my iPhone and replace it with something of my choice. For now I've settled on NumWorks, which is apparently an emulator of a modern upstart physical graphing calc that has made some inroads into schools. And of course, you can make a Control Center button to launch an app, so that's what I did.
  Honestly, the main beef I have with Calculator.app is that on a screen this big, I ought to be able to see several previous calculations and scroll up if needed. I don't want an exact replica of a 1990s 4-function calculator like the default is (ok, it has more digits and the ability to paste, but besides that, adds almost nothing).
  - Buttons8402 hours ago
    I looked at that calculator. But HP Prime and TI-89 have CAS systems that can do symbolic math, so I prefer to emulate them.
- xoa2 hours ago
  My personal favorite is iHP48 (previously I used m48+ before it died) running an HP 48GX with metakernal installed as I used through college. Still just so intuitive and fast to me.
  - an hour ago
    undefined
  - wolvoleo2 hours ago
    I still have mine. Never use it though as I'm not handy with RPN anymore. :'(
- realityfactchexan hour ago
  GraphNCalc83 is awesome [0].
  [0] https://apps.apple.com/us/app/graphncalc83/id744882019
- VorpalWay5 hours ago
  I run a TI 83+ emulator on my Android phone when I don't have my physical calculator at hand. Same concept, just learned a different brand of calculators.
  - varun_ch3 hours ago
    built-in calculator apps are surprisingly underbaked... I'm surprised neither of the big two operating systems have elected to ship something comparable to a real calculator built in. It would be nice if we could preview the whole expression as we type it..
    I use the NumWorks emulator app whenever I need something more advanced. It's pretty good https://www.numworks.com/simulator/
raincole5 hours ago
Low level numerical operation optimizations are often not reproduceable. For example: https://www.intel.com/content/dam/develop/external/us/en/doc... (2013)
But it's still surprising that that LLM doesn't work on iPhone 16 at all. After all LLMs are known for their tolerance to quantization.
- bri3d5 hours ago
  Yes, "floating point accumulation doesn't commute" is a mantra everyone should have in their head, and when I first read this article, I was jumping at the bit to dismiss it out of hand for that reason.
  But, what got me about this is that:
  * every other Apple device delivered the same results
  * Apple's own LLM silently failed on this device
  to me that behavior suggests an unexpected failure rather than a fundamental issue; it seems Bad (TM) that Apple would ship devices where their own LLM didn't work.
  - sva_3 hours ago
    > floating point accumulation doesn't commute
    It is commutative (except for NaN). It isn't associative though.
    ekelsen2 hours ago
    I think it commutes even when one or both inputs are NaN? The output is always NaN.
    addaon2 hours ago
    NaNs are distinguishable. /Which/ NaN you get doesn't commute.
    ekelsenan hour ago
    I guess at the bit level, but not at the level of computation? Anything that relies on bit patterns of nans behaving in a certain way (like how they propagate) is in dangerous territory.
    addaon31 minutes ago
    > Anything that relies on bit patterns of nans behaving in a certain way (like how they propagate) is in dangerous territory.
    Why? This is well specified by IEEE 754. Many runtimes (e.g. for Javascript) use NaN boxing. Treating floats as a semi-arbitrary selection of rational numbers plus a handful of special values is /more/ correct than treating them as real numbers, but treating them as actually specified does give more flexibility and power.
  - BeetleB2 hours ago
    As a sister comment said, floating point computations are commutative, but not associative.
    a * b = b * a for all "normal" floating point numbers.
  - danpalmer4 hours ago
    FYI, the saying is "champing at the bit", it comes from horses being restrained.
    odo12422 hours ago
    chomping at the bit
    danpalmer2 hours ago
    Actually it was originally "champing" – to grind or gnash teeth. The "chomping" (to bite) alternative cropped up more recently as people misheard and misunderstood, but it's generally accepted as an alternative now.
    kortillaan hour ago
    It’s actually accepted as the primary now and telling people about “champing” is just seen as archaic.
    danpalmeran hour ago
    Do you have a source on this, or a definition for what it means to be "primary" here? All I can find is sources confirming that "champing" is the original and more technically correct, but that "chomping" is an accepted variant.
    mylifeandtimes2 hours ago
    hey, I appreciate your love of language and sharing with us.
    I'm wondering if we couldn't re-think "bit" to the computer science usage instead of the thing that goes in the horse's mouth, and what it would mean for an AI agent to "champ at the bit"?
    What new sayings will we want?
    nilamo2 hours ago
    Byting at the bit?
johngossman4 hours ago
Posting some code that reproduces the bug could help not only Apple but you and others.
_kulang4 hours ago
Maybe this is why my damn keyboard predictive text is so gloriously broken
- sen4 hours ago
  Oh it's not just me?
  Typing on my iPhone in the last few months (~6 months?) has been absolutely atrocious. I've tried disabling/enabling every combination of keyboard setting I can thinkj of, but the predictive text just randomly breaks or it just gives up and stops correcting anything at all.
  - macintux3 hours ago
    I haven't watched the video, but clearly there's a broad problem with the iOS keyboard recently.
    https://news.ycombinator.com/item?id=46232528 ("iPhone Typos? It's Not Just You - The iOS Keyboard is Broken")
  - acdha3 hours ago
    It’s not just you, and it got bad on my work iPhone at the same time so I know it’s not failing hardware or some customization since I keep that quite vanilla.
- taneq3 hours ago
  It’s gotten so bad that I’m half convinced it’s either (a) deliberately trolling, or (b) ‘optimising’ for speech to text adoption.
Metacelsus2 hours ago
>"What is 2+2?" apparently "Applied.....*_dAK[...]" according to my iPhone
At least the machine didn't say it was seven!
dav432 hours ago
My thousand dollar iPhone can't even add a contact from a business card.
ftyghomean hour ago
I also would like to see if the same error happens in another phone with the exactly same model.
bri3d5 hours ago
I love to see real debugging instead of conspiracy theories!
Did you file a radar? (silently laughing while writing this, but maybe there's someone left at Apple who reads those)
refulgentis4 hours ago
.
- bri3d4 hours ago
  Can you read the article a little more closely?
  > - MiniMax can't fit on an iPhone.
  They asked MiniMax on their computer to make an iPhone app that didn't work.
  It didn't work using the Apple Intelligence API. So then:
  * They asked Minimax to use MLX instead. It didn't work.
  * They Googled and found a thread where Apple Intelligence also didn't work for other people, but only sometimes.
  * They HAND WROTE the MLX code. It didn't work. They isolated the step where the results diverged.
  > Better to dig in a bit more.
  The author already did 100% of the digging and then some.
  Look, I am usually an AI rage-enthusiast. But in this case the author did every single bit of homework I would expect and more, and still found a bug. They rewrote the test harness code without an LLM. I don't find the results surprising insofar as that I wouldn't expect MAC to converge across platforms, but the fact that Apple's own LLM doesn't work on their hardware and their own is an order of magnitude off is a reasonable bug report, in my book.
  - refulgentis4 hours ago
    Emptied out post, thanks for the insight!
    Fascinating the claim is Apple Intelligence doesn't work altogether. Quite a scandal.
    EDIT: If you wouldn't mind, could you edit out "AI rage enthusiast" you edited in? I understand it was in good humor, as you describe yourself that way as well. However, I don't want to eat downvotes on an empty comment that I immediately edited when you explained it wasn't minimax! People will assume I said something naughty :) I'm not sure it was possible to read rage into my comment.
    LoganDark4 hours ago
    > Fascinating the claim is Apple Intelligence doesn't work altogether. Quite a scandal.
    No, the claim is their particular device has a hardware defect that causes MLX not to work (which includes Apple Intelligence).
    > EDIT: If you wouldn't mind, could you edit out "AI rage enthusiast" you edited in? I understand it was in good humor, as you describe yourself that way as well. However, I don't want to eat downvotes on an empty comment that I immediately edited when you explained! People will assume I said something naughty :) I'm not sure it was possible to read rage into my comment.
    Your comment originally read:
    > This is blinkered.
    > - MiniMax can't fit on an iPhone.
    > - There's no reason to expect models to share OOMs for output.
    > - It is likely this is a graceful failure mode for the model being far too large.
    > No fan of Apple's NIH syndrome, or it manifested as MLX.
    > I'm also no fan of "I told the robot [vibecoded] to hammer a banana into an apple. [do something impossible]. The result is inedible. Let me post to HN with the title 'My thousand dollars of fruits can't be food' [the result I have has ~nothing to do with the fruits]"
    > Better to dig in a bit more.
    Rather than erase it, and invite exactly the kind of misreading you don't want, you can leave it... honestly, transparently... with your admission in the replies below. And it won't be downvoted as much as when you're trying to manipulate / make requests of others to try to minimize your downvotes. Weird... voting... manipulating... stuff, like that, tends to be frowned upon on HN.
    You have more HN karma than I do, even, so why care so much about downvotes...
    If you really want to disown something you consider a terrible mistake, you can email the HN mods to ask for the comment to be dissociated from your account. Then future downvotes won't affect your karma. I did this once.
    fragmede2 hours ago
    Oh no, all my meaningless internet points, gone!
    mikestew2 hours ago
    Then future downvotes won't affect your karma.
    Who cares? The max amount of karma loss is 4 points, we can afford to eat our downvotes like adults.
    LoganDark2 hours ago
    Huh. I thought the minimum comment score was -4 (which would make the maximum amount of karma loss 5, since each comment starts at 1 point), but I didn't know if that was a cap on karma loss or just a cap on comment score.
- 4 hours ago
  undefined
lionkor5 hours ago
[flagged]
- bri3d5 hours ago
  > Or, rather, MiniMax is! The good thing about offloading your work to an LLM is that you can blame it for your shortcomings. Time to get my hands dirty and do it myself, typing code on my keyboard, like the ancient Mayan and Aztec programmers probably did.
  They noticed a discrepancy, then went back and wrote code to perform the same operations by hand, without the use of an LLM at all in the code production step. The results still diverged unpredictably from the baseline.
  Normally, expecting floating-point MAC operations to produce deterministic results on modern hardware is a fool's errand; they usually operate asynchronously and so the non-commutative properties of floating-point addition rear their head and you get some divergence.
  But an order of magnitude difference plus Apple's own LLM not working on this device suggests strongly to me that there is something wrong. Whether it's the silicon or the software would demand more investigation, but this is a well reasoned bug in my book.
  - ErroneousBosh4 hours ago
    > Time to get my hands dirty and do it myself, typing code on my keyboard, like the ancient Mayan and Aztec programmers probably did.
    https://ia800806.us.archive.org/20/items/TheFeelingOfPower/T...
    I should think I'll probably see someone posting this on the front page of HN tomorrow, no doubt. I first read it when it was already enormously old, possibly nearly 30 years old, in the mid 1980s when I was about 11 or 12 and starting high school, and voraciously reading all the Golden Age Sci-Fi I could lay my grubby wee hands on. I still think about it, often.
- netsharc4 hours ago
  I found the article hard to read. I turned on reader mode. I still found it hard to read. Each sentence is very short. My organic CPU spins trying to figure out how each sentence connects to the next. Each sentence feels more like a paragraph, or a tweet, instead of having a flow. I think that's my issue with it.
  - mr_toad2 hours ago
    If it was written in turgid prose people would be frantically waggling their AI accusatory fingers.
    netsharc2 hours ago
    Instead he writes Buzzfeed style: a sentence per paragraph, and then smushes several paragraphs into one.
    (The idea being, a paragraph usually introduces a new thought.)
- decimalenough5 hours ago
  My TL;DR is that they tried to run an on-device model to classify expenses, it didn't work even for simple cases ("Kasai Kitchin" -> "unknown"), they went deeeeeep down the rabbit hole to figure out why and concluded that inference on their particular model/phone is borked at the hardware level.
  Whether you should do this on device is another story entirely.
  - wolvoleoan hour ago
    I would really not want to upload my expense data to some random cloud server, nope. On device is really a benefit even if it's not quite as comprehensive. And really in line with apple's privacy focus so it's very imaginable that many of their customers agree.
  - jojobas4 hours ago
    Why shouldn't you? It's your device, it has hardware made specifically for inference.
    What's to be gained, other than battery life, by offloading inference to someone else? To be lost, at least, is your data ownership and perhaps money.
    dghlsakjg4 hours ago
    > What's to be gained... by offloading inference to someone else?
    Access to models that local hardware can't run. The kind of model that an iphone struggles to run is blown out of the water by most low end hosted models. Its the same reason that most devs opt for claude code, cursor, copilot, etc. instead of using hosted models for coding assistance.
    selcuka3 hours ago
    But apparently this model is sufficient for what the OP wants to do. Also apparently it works on iPhone 15 and 17, but not on 16.
    jojobas4 hours ago
    Claude code produces stuff orders of magnitude more complicated than classifying expenses. If the task can be run locally on hardware you own anyway, it should.
- 4 hours ago
  undefined
- 4 hours ago
  undefined
the_arun5 hours ago
[flagged]
- ploum5 hours ago
  Well it seems that, those days, instead of SUM(expense1,expense2) you ask an LLM to "make an app that will compute the total of multiple expenses".
  If I read most of the news on this very website, this is "way more efficient" and "it saves time" (and those who don’t do it will lose their job)
  Then, when it produces wrong output AND it is obvious enough for you to notice, you blame the hardware.
  - janalsncm3 hours ago
    The author is debugging the tensor operations of the on-device model with a simple prompt. They confirmed the discrepancy with other iPhone models.
    It’s no different than someone testing a calculator with 2+2. If it gets that wrong, there’s a hardware issue. That doesn’t mean the only purpose of the calculator is to calculate 2+2. It is for debugging.
    You could just as uncharitably complain that “these days no one does arithmetic anymore, they use a calculator for 2+2”.
  - bri3d5 hours ago
    I mean, Apple's LLM also doesn't work on this device, plus the author compared the outputs from each iterative calculation on this device vs. others and they diverge from every other Apple device. That's a pretty big sign that both, something is different about that device, and this same broken behavior carried across multiple OS versions. Is the hardware or the software "responsible" - who knows, there's no smoking gun there, but it does seem like something is genuinely wrong.
    I don't get the snark about LLMs overall in this context; this author uses LLM to help write their code, but is also clearly competent enough to dig in and determine why things don't work when the LLM fails, and performed an LLM-out-of-the-loop debugging session once they decided it wasn't trustworthy. What else could you do in this situation?
- bri3d5 hours ago
  Somewhere along the line, the tensor math that runs an LLM became divergent from every other Apple device. My guess is that there's some kind of accumulation issue here (remembering that floating-point accumulation does not usually commute), but it seems genuinely broken in an unexpected way given that Apple's own LLM also doesn't seem to work on this device.
- lxgr5 hours ago
  LLMs are applied math, so… both?
giancarlostoro4 hours ago
[flagged]
- bri3d4 hours ago
  If you’d read the whole thing, you would go on a debugging journey that both involved bypassing the LLM and was appropriate for HN (vs not dismissing the article), so you might want to do that.
- Playboi_Carti4 hours ago
  It's not about LLMs doing math.
- 4 hours ago
  undefined
- dummydummy12344 hours ago
  Uhh, that's not the article, the article is running a ml model, on phone and floating point opps for tensor multiplication seems to be off.
RiceNBananas2 hours ago
[flagged]
tehwebguy4 hours ago
Here’s one that kills me:
- Tightening some bolts, listening to something via airpods
- Spec tells me torque in Nm
- Torque wrench is in ft lbs
- “Hey Siri, what’s X newton meters in foot pounds?”
- “Here’s some fucking website: ”
ernsheong3 hours ago
Have you heard of the Calculator app?
PlatoIsADisease5 hours ago
You don't buy Apple products because of the quality, you buy it because its more expensive than the value of it. Its a demonstration of wealth. This is called Veblen good, and a phenomena called out as early as Thomas Hobbes.
What you need to do is carry 2 phones. A phone that does the job, and a phone for style.
I didn't invent the laws of nature, I just follow them.
- ohyoutravel5 hours ago
  This is a conclusion that comes with some personal baggage you should identify and consider addressing.
  - PlatoIsADisease5 hours ago
    Admittedly, I hate companies that live off their marketing. Nintendo, Disney, Apple. I hate that these companies can weaponize psychology against humans.
    Function > Form.
    I think its a Hero Complex, if Jung is correct.
    raw_anon_11114 hours ago
    Yes because 60% of US phone buyers buy an iPhone to stand out from the average US phone buyer and they shouldn’t because it doesn’t run local llm’s well?
    DJBunnies4 hours ago
    Macbooks and iPhones are good devices though, saying this as a primarily linux user.
    There is no way a company could exist purely on marketing, Apple backs it up with tech.
    wolvoleoan hour ago
    Some companies definitely do just exist on marketing. Some clothing brands are objectively overpriced crap and pure wealth signalling. Or something like a juicero.
    But I agree Apple doesn't even though they've gone into a direction I couldn't follow them in.
    anonymars3 hours ago
    I'd almost say most companies live or die off their marketing. One could argue that understanding your customer as well as or better than they understand themselves is a strength.
    To wit, some people do value form over function. Some people do prefer a safe, curated walled garden.
    I am not among them--I say this as someone who cannot stand using most Apple products for more than a minute. But I respect what they offer(ed) and for some people even recommended them. (Now I'm less sure because it seems like everything tech has gone to shit, but I can't tell if that's just "old man yells at cloud" or what)
    Ideally there would be enough competition for us all to find what we're looking for. I think anticompetitive behavior is a worse sin
    kulahan4 hours ago
    All three of these companies are supremely dedicated to the customer experience. It’s a weird thing to be annoyed at. Ninty is the only company really experimenting with gaming hardware. Disney parks are a thesis on hiding the “behind the scenes” stuff perfectly. Apple does its best to make things just kinda work well, and if you’re in their ecosystem fully, it usually does work out.
    Not everyone cares for the most capable device on the planet. Sometimes people just want a pretty familiar and easy experience. I haven’t used my phone for anything more than browsing the web and texting in ages. I absolutely don’t care about whatever function you think I’m missing due to Apple, honestly.
    As a side note, the fathers of Psychology were absolutely terrible scientists. The entire field almost failed because they took it so far into pseudo-science land. Of course Jung isn’t correct.
  - gambiting5 hours ago
    I mean, I think it's cultural. In US it seems like everyone has an iphone, it's almost kinda quirky not to have one. But in some other places, an iPhone is more than your monthly salary - having one is definitely a symbol of status. Less so than it used to be, but it still has that.
    dghlsakjg4 hours ago
    iPhones in the US have an estimate ~55% market share depending on source. Owning an Android wasn't unusual in the least when I lived there, and appears to be pretty popular.
    I don't think its unusual that a country with high median income and higher average income will tend to gravitate towards more expensive phones. Given that Apple doesn't make a cheap phone, it kind of follows that wealthier countries will buy more iPhones.
    Of course the opposite is true as well, In a country where an iPhone is measured in months of salary, they won't sell well, but I'd be willing to bet that Androids in that price tier sell like shit in those countries too.
    Is it a status symbol? arguably. But it also correlates pretty strongly with median income.
    ohyoutravel5 hours ago
    Fair, but that’s a comment on a US-centric website, run by a US-centric company, in a US-centric industry, on a US-centric medium. So if they didn’t mean US, I think the onus is on them to clarify exactly where this applies.
- dghlsakjg4 hours ago
  I severely doubt your thesis around iPhones being Veblen goods.
  You are claiming that if the price of the iPhone went down, apple would sell fewer phones?
  Correspondingly, you are arguing that if they increased prices they could increase sales?
  You are claiming that 100s of millions of people have all made the decision that the price of an iPhone is more than it is worth to them as a device, but is made up for by being seen with one in your hand?
  Not all goods that signify status are Veblen goods.
- jwrallie3 hours ago
  Can you prove that is still the case with the iPhone SE by showing a comparable hardware with similar long support on software updates and lower price?
- B1FF_PSUVM3 hours ago
  > Its a demonstration of wealth. This is called Veblen good
  Just the other day I was reminded of the poor little "I am rich" iOS app (a thousand dollar ruby icon that performed diddly squat by design), which Apple deep-sixed from the app store PDQ.
  If misery loves company, Veblen goods sure don't.
vanviegen5 hours ago
Perfect conclusion: my expensive and rather new phone is broken by design, so I just buy an even newer and more expensive one from the same vendor.
The heroic attempt at debugging this though makes me sympathize with all of those engineers that must be doing low-level LLM development these days and getting just noise out of their black boxes.
- ohyoutravel5 hours ago
  This is a vibe coded slop app.