Does Apple‘s M5 Max Really “Destroy” a 96-Core Threadripper?(slashdot.org)

67 pointsby dkechag7 hours ago12 comments

Aurornis7 hours ago
GeekBench probably made the right choice to optimize for more realistic real-world workloads than for the more specific workloads that benefit from really high core counts. GeekBench is supposed to be a proxy for common use case performance.
High core count CPUs are only useful for specific workloads and should not be purchased as general purpose fast CPUs. Unless you’re doing specific tasks that scale by core count, a CPU with fewer cores and higher single threaded throughput would be faster for normal use cases.
The callout against the poor journalism at Tom’s Hardware isn’t something new. They have a couple staff members posting clickbait all the time. Some times the links don’t even work or they have completely wrong claims. This is par for the site now.
To be fair, the Tom’s Hardware article did call out these points and the limitations in the article, so this SlashDot critique is basically repeating the content of the Tom’s Hardware article but more critically https://www.tomshardware.com/pc-components/cpus/apples-18-co...
- jltsiren5 hours ago
  Geekbench 6 Multi-Core is fundamentally a single-task benchmark. It measures performance in workloads, where the user is not running anything significant in the background. If you are a developer who wants to continue using the computer while compiling a large project in the background, Geekbench results are not particularly informative for you.
  I've personally found that Apple's Pro/Max chips have already too many CPU cores for Geekbench.
- barrkel6 hours ago
  As an owner of a 96 core 9995wx, nobody is buying one for desktop PC much less laptop level software.
  To justify the investment you need to have tasks that scale out, or loads of heterogeneous tasks to support concurrently.
  - ozfive6 hours ago
    What tasks are you running on your 96 core 9995wx?
    cozzyd3 hours ago
    Make -j97 presumably. Or MPI jobs.
    whateverboat6 hours ago
    LLVM developer compiling the full LLVM stack every 10 minutes.
  - jeffbee6 hours ago
    Right, this is a car-priced CPU and the only rational reason to have one is that you can exploit it for profit. One pretty great reason would be giving it to your expensive software developers so they don't sit there waiting on compilers.
- embedding-shape5 hours ago
  Buried in the middle of that article:
  > Furthermore, many of the suite’s multi-threaded subtests scale efficiently only to roughly 8 – 32 threads, which leaves much of such CPUs' parallel capacity idle, but which creates an almost perfect environment for Apple's CPUs that feature a relatively modest number of cores
  Invalidates the entire comparison really, and should have canned the article if they had any integrity.
  - wmf5 hours ago
    AMD has 16 cores, Apple has 18, Qualcomm has 18, Nvidia N1X has 20, and Intel has 24. All else being equal you actually want as few cores as you can get away with because that's less likely to be limited by Amdahl's Law. Arguably Intel/Nvidia CPUs are poorly designed and benchmarks have no obligation to accommodate them.
    (I'm not counting high-end workstation/server CPUs because, as others in this thread have explained, Geekbench isn't intended for them.)
- pram5 hours ago
  GeekBench being questionable aside, these results have me stoked to see what an M5 Ultra looks like performance-wise!
- dkechag5 hours ago
  Not sure you opened the blog post. The scaling is atrocious, even for tasks that should be extremely parallelizable. The Geekbench "Text Processing" benchmark supposedly processes 190 markdown files, and yet it tops at just 1.34x the single-thread performance when you have 4 cores, and it drops with more cores! I admit my expertise is algorithms & optimization so I may get more easily incensed by inept developers, but this is crazy... It is not realistic in any way, unless we assume the "real world" is just js beginners scribbling code for a website...
- refulgentis7 hours ago
  I think this actually concedes the main criticism.
  If Geekbench 6 multicore is primarily a proxy for “common use case performance” rather than for workloads that actually use lots of cores, then it shouldn’t be treated as a general multicore CPU benchmark, and it definitely shouldn’t be the basis for sweeping 18-core vs 96-core conclusions.
  That may be a perfectly valid design choice. But then the honest takeaway is: GB6 multicore measures a particular class of lightly/moderately threaded shared-task workloads, not broad multicore capability.
  The criticism isn’t “every workload should scale linearly to 96 cores.” It’s that a benchmark labeled “multicore” is being used as if it were a general multicore proxy when some of its workloads stop scaling very early, including ones that sound naturally parallelizable.
  - wtallis6 hours ago
    Geekbench 6 isn't really marketed as a one-size-fits-all benchmark. It's specifically aimed at consumer hardware. The first paragraph on geekbench.com reads:
    > Geekbench 6 is a cross-platform benchmark that measures your system's performance with the press of a button. How will your mobile device or desktop computer perform when push comes to crunch? How will it compare to the newest devices on the market? Find out today with Geekbench 6.
    And further down,
    > Includes updated CPU workloads and new Compute workloads that model real-world tasks and applications. Geekbench is a benchmark that reflects what actual users face on their mobile devices and personal computers.
    refulgentis6 hours ago
    The problem is, in practice, despite nonspecific marketing language, people do use the multicore benchmark to measure multicore performance. Including for things like Threadripper, which is not exactly an exotic science project CPU or non-personal or non-desktop.
    wtallis6 hours ago
    > Including for things like Threadripper, which is not exactly an exotic science project CPU or non-personal or non-desktop.
    We're talking about a CPU with a list price over $10000.
    Geekbench 6 is a bad test to use to assess the suitability of a 96-core Threadripper for the kinds of use cases where buying a 96-core Threadripper might make sense. But Geekbench 6 does a very good job of illustrating the point that buying a 96-core Threadripper would be a stupid waste of money for a personal desktop and the typical use cases of a personal desktop.
    refulgentis6 hours ago
    Holy hell. Lol. I did not realize how generous $PREVIOUS_EMPLOYER was.
  - Aurornis6 hours ago
    > then it shouldn’t be treated as a general multicore CPU benchmark,
    It is a general multi core benchmark for its target audience.
    It’s not marketed as “the multi core scaling benchmark”. Geekbench is advertised as a benchmark suite and it has options to run everything limited to a single core or to let it use as many cores as it can.
    96-core CPUs are not its target audience.
- FpUser6 hours ago
  >"High core count CPUs are only useful for specific workloads and should not be purchased as general purpose fast CPUs. Unless you’re doing specific tasks that scale by core count, a CPU with fewer cores and higher single threaded throughput would be faster for normal use cases."
  I design multithreaded backends that benefit from as many cores as possible while not being a champion in a single core task. I think this is very common use case.
  - Aurornis5 hours ago
    Maybe I’m misunderstand what you’re saying, but designing multithreaded backends is not a very common use case.
    Most computer use cases don’t involve software development at all.
    FpUser2 hours ago
    Running those backends is very common. Just not in one's house / apartment
wtallis7 hours ago
You're seriously posting to HN a link to your Slashdot post linking to your year-old blog post complaining about Geekbench 6's multi-threaded test without ever mentioning Amdahl's Law?
Pretending that everything a CPU does is an embarrassingly parallel problem is heinous benchmarking malpractice. Yes, Geekbench 6 has its flaws, and limitations. All benchmarks do. Geekbench 6 has valid uses, and its limitations are defensible in the context of using it to measure what it is intended to measure. The scalability limitations it illustrates are real problems that affect real workloads and use cases. Calling it "broken" because it doesn't produce the kind of scores a marketing department would want to see from a 96-core CPU reflects more poorly on you than it does on Geekbench 6.
- ThrowawayR24 hours ago
  Possibly the underlying reason for the indirection is because that particular domain is banned on HN; check https://news.ycombinator.com/from?site=dev.to with showdead turned on.
  - dkechag3 hours ago
    Yeah, that's the main reason. Not sure why the ban, medium etc are much worse...
    The second reason is I've never had a slashdot submission accepted and I saw that after posting this the page suggested I share it to increase chances of the editors picking it up, but I don't really use social media so I though hey why not HN.
- dkechag4 hours ago
  From your post I can tell you did not read the "year-old blog post". It starts with the scaling, but goes further and explains a lot. I am a software engineer specializing in algorithms and optimization, Amdahl's law is part of the usual training I give our junior developers. It has nothing to do with Geekbench 6 being a surprisingly bad benchmark, especially for big CPUs.
  - omikun3 hours ago
    If you’re in the market for a 96 core cpu and you’re using Geekbench to guide your purchasing decision, the fault lies with you.
- refulgentis7 hours ago
  Amdahl’s Law is descriptive, not exculpatory.
  It explains why a workload with a large serial/contended fraction won’t scale.
  It does not prove that the workload’s serial fraction is representative of the category it claims to stand in for.
  So when a benchmark’s “text processing” test over ~190 files barely gets past ~1.3x on 8 cores, that’s not some profound demonstration that CPUs can’t parallelize text work. It’s mostly a demonstration that this benchmark’s implementation has a very large serial bottleneck.
  That would be fine if people treated GB6 multicore as a narrow benchmark of specific shared-task client workloads. The problem is that it is labelled as a general multicore CPU metric, and is used as such, including for 18-core vs 96-core comparisons. That’s the misuse being criticized.
  TL;DR: Amdahl’s Law explains the ceiling; it does not justify treating an avoidably low ceiling as a general measure of multicore CPU capability.
  EDIT: Also, submitter, I'm not sure why parent is upset that you submitted. Thanks for sharing. I've been wondering for years why GeekBench was obviously broken on multicore. (comes up a lot in Apple discussions, as you know)
grg07 hours ago
The real meat from the article: https://dev.to/dkechag/how-geekbench-6-multicore-is-broken-b...
First plot really says it all.
- wtallis7 hours ago
  Compare to https://upload.wikimedia.org/wikipedia/commons/e/ea/AmdahlsL...
  - wmf7 hours ago
    The article is probably right about text processing though. It sounds like they took an inherently parallel task with no communication and (accidentally?) crippled it.
    wtallis6 hours ago
    I'm not sure what's going on with that subtest, and the lack of scaling is certainly egregious. But we've all encountered tasks that in theory could scale much better but in practice have been implemented in a more or less serial fashion. That kind of thing probably isn't a good choice for a multi-core test suite, but on the other hand: given that Geekbench has both multi-core and single-core scores for the same subtests (though with different problem sizes), it would be unrealistic if all the subtests were highly scalable. Encountering bad scalability is a frequent, everyday part of using computers.
thot_experiment7 hours ago
"When you measure, include the measurer" - MC Hammer
- Finnucane7 hours ago
  When you are MC Hammer, everything is an MC nail.
SOTGO7 hours ago
Anyone who treats Geekbench as a meaningful benchmark (i.e. not without a huge disclaimer or with other more meaningful datapoints) is not to be trusted. You can only really trust it for inter-generational comparisons within a single architecture.
- osti4 hours ago
  Not true. Geekbench, especially single threaded benchmark, is probably the best we got, it has a bunch of workloads, unlike many other benchmarks like cinebench for example. And they publish all the results on their website, so you can dig into each individual workload and find the ones that apply to you.
  And like the other poster mentioned, it correlates well with SPEC, so it's basically a easily accessible SPEC. These days the only benchmark I use to quickly judge some CPU is geekbench.
- skavi7 hours ago
  Since Geekbench 5, the single threaded benchmark scores have aligned pretty well with those from the industry standard SPEC benchmark.
eointierney6 hours ago
What in the name of goodness are you doing to your poor computers?
Have you not heard of C-x M-c M-butterfly?
doener6 hours ago
TIL: Slashdot still exists. And it looks exactly as horrible as 20 years ago.
- Razengan6 hours ago
  I mean so does HN
  - glimshe6 hours ago
    HN is minimalist, not horrible. And the content is good!
    Razengan3 minutes ago
    All that VC money couldn't buy an accessibility expert apparently
    leptons4 hours ago
    Well this particular comment thread isn't "good". If you think so, then maybe you should try reddit.
eointierney6 hours ago
What in the name of goodness are you doing to your poor computers?
ddtaylor7 hours ago
The strategy is to make outlandish claims and then have people "engaging" to "disprove" all of the claims. This strategy works as long as people are too apathetic and/or stupid to hold liars accountable. It works currently because journalism has significantly less value than tabloid drama to many people, some of which are just narrative shopping for a fun curated list of ideas (not facts) that fit their personalized echo chamber.
yieldcrv6 hours ago
I'm really confused by the self-aggrandizing here, muddying up the discussion
how good is the M5 Max in comparison to a 96-core threadripper? what's the tl;dr, where are the broader assortments of benchmarks
I just want to see some bargraphs that say "lower is better" or "higher is better"
- wmf6 hours ago
  Since M5 Max hasn't been released yet there's only one leaked benchmark so far which is Geekbench and everybody is (over)analyzing that score. https://www.tomshardware.com/pc-components/cpus/apples-18-co...
phoronixrly7 hours ago
[flagged]
- JoshTriplett7 hours ago
  Phoronix is terrible in terms of clickbait and deliberate ragebait, and its comment section is a toxic cesspool, but its benchmarks generally seem sound. What issues have you observed with their benchmark suite?
  - wtallis7 hours ago
    Given the username of the account you're replying to, and the implausibility of a Phoronix reader being unaware of Tom's Hardware, I think you may have been baited by a troll.
    phoronixrly5 hours ago
    I just didn't know their methodology was worse than Phoronix'.
  - 2OEH8eoCRo06 hours ago
    What clickbait and ragebait? Example?
    JoshTriplett4 hours ago
    Some examples:
    > sudo-rs Breaks Historical Norms With Now Enabling Password Feedback By Default
    "Let's tell you exactly how to feel about this, commenters"
    > Linux 7.0 Officially Concluding The Rust Experiment
    "...by declaring it successful, but if we said that in the headline it wouldn't be clickbait"
    (In fairness, LWN did this too, but by accident rather than to provoke clicks.)
    > GNOME Mutter Now "Completely Drops The Whole X11 Backend"
    > systemd 260 Dropping System V Service Script Support
    "And has dozens of other features and improvements, but let's cherry-pick the one we want you to yell about in our comments section."
    > The Linux Kernel Looks To 'Bite The Bullet' In Enabling Microsoft C Extensions
    "Oh no, look at the poor kernel developers reluctantly dealing with Microsoft extensions...that they deliberately sought out and used because they prefer them over standard C, not being forced by any external factor."
    They know their comments section, they know what gets them posting, and they optimize for provoking comment.
    phoronixrly5 hours ago
    Off the top of my head -- any systemd-related article.
  - phoronixrly5 hours ago
    Where the definition of 'benchmarks' is actually slapping an OS, then proceeding to run the so called Phoronix Test Suite promptly followed by an apples to oranges comparison...
    My favourites were the comparisons of FreeBSD and Linux coming to the conclusion that FreeBSD is slower. Until you look under the hood and see that both are tested in a configuration with a desktop environment.
    Or the good old ZFS tests that were coming up with nonsensical results because of gross misconfiguration and/or total lack of understanding how the FS works...
    But hey, the click/ragebait is on point in both these cases!
pengaru6 hours ago
people already "destroy" the many-core threadrippers with gaming-oriented ryzens on appropriately suited workloads, this is clickbait