High core count CPUs are only useful for specific workloads and should not be purchased as general purpose fast CPUs. Unless you’re doing specific tasks that scale by core count, a CPU with fewer cores and higher single threaded throughput would be faster for normal use cases.
The callout against the poor journalism at Tom’s Hardware isn’t something new. They have a couple staff members posting clickbait all the time. Some times the links don’t even work or they have completely wrong claims. This is par for the site now.
To be fair, the Tom’s Hardware article did call out these points and the limitations in the article, so this SlashDot critique is basically repeating the content of the Tom’s Hardware article but more critically https://www.tomshardware.com/pc-components/cpus/apples-18-co...
I've personally found that Apple's Pro/Max chips have already too many CPU cores for Geekbench.
To justify the investment you need to have tasks that scale out, or loads of heterogeneous tasks to support concurrently.
> Furthermore, many of the suite’s multi-threaded subtests scale efficiently only to roughly 8 – 32 threads, which leaves much of such CPUs' parallel capacity idle, but which creates an almost perfect environment for Apple's CPUs that feature a relatively modest number of cores
Invalidates the entire comparison really, and should have canned the article if they had any integrity.
(I'm not counting high-end workstation/server CPUs because, as others in this thread have explained, Geekbench isn't intended for them.)
If Geekbench 6 multicore is primarily a proxy for “common use case performance” rather than for workloads that actually use lots of cores, then it shouldn’t be treated as a general multicore CPU benchmark, and it definitely shouldn’t be the basis for sweeping 18-core vs 96-core conclusions.
That may be a perfectly valid design choice. But then the honest takeaway is: GB6 multicore measures a particular class of lightly/moderately threaded shared-task workloads, not broad multicore capability.
The criticism isn’t “every workload should scale linearly to 96 cores.” It’s that a benchmark labeled “multicore” is being used as if it were a general multicore proxy when some of its workloads stop scaling very early, including ones that sound naturally parallelizable.
> Geekbench 6 is a cross-platform benchmark that measures your system's performance with the press of a button. How will your mobile device or desktop computer perform when push comes to crunch? How will it compare to the newest devices on the market? Find out today with Geekbench 6.
And further down,
> Includes updated CPU workloads and new Compute workloads that model real-world tasks and applications. Geekbench is a benchmark that reflects what actual users face on their mobile devices and personal computers.
We're talking about a CPU with a list price over $10000.
Geekbench 6 is a bad test to use to assess the suitability of a 96-core Threadripper for the kinds of use cases where buying a 96-core Threadripper might make sense. But Geekbench 6 does a very good job of illustrating the point that buying a 96-core Threadripper would be a stupid waste of money for a personal desktop and the typical use cases of a personal desktop.
It is a general multi core benchmark for its target audience.
It’s not marketed as “the multi core scaling benchmark”. Geekbench is advertised as a benchmark suite and it has options to run everything limited to a single core or to let it use as many cores as it can.
96-core CPUs are not its target audience.
I design multithreaded backends that benefit from as many cores as possible while not being a champion in a single core task. I think this is very common use case.
Pretending that everything a CPU does is an embarrassingly parallel problem is heinous benchmarking malpractice. Yes, Geekbench 6 has its flaws, and limitations. All benchmarks do. Geekbench 6 has valid uses, and its limitations are defensible in the context of using it to measure what it is intended to measure. The scalability limitations it illustrates are real problems that affect real workloads and use cases. Calling it "broken" because it doesn't produce the kind of scores a marketing department would want to see from a 96-core CPU reflects more poorly on you than it does on Geekbench 6.
The second reason is I've never had a slashdot submission accepted and I saw that after posting this the page suggested I share it to increase chances of the editors picking it up, but I don't really use social media so I though hey why not HN.
It explains why a workload with a large serial/contended fraction won’t scale.
It does not prove that the workload’s serial fraction is representative of the category it claims to stand in for.
So when a benchmark’s “text processing” test over ~190 files barely gets past ~1.3x on 8 cores, that’s not some profound demonstration that CPUs can’t parallelize text work. It’s mostly a demonstration that this benchmark’s implementation has a very large serial bottleneck.
That would be fine if people treated GB6 multicore as a narrow benchmark of specific shared-task client workloads. The problem is that it is labelled as a general multicore CPU metric, and is used as such, including for 18-core vs 96-core comparisons. That’s the misuse being criticized.
TL;DR: Amdahl’s Law explains the ceiling; it does not justify treating an avoidably low ceiling as a general measure of multicore CPU capability.
EDIT: Also, submitter, I'm not sure why parent is upset that you submitted. Thanks for sharing. I've been wondering for years why GeekBench was obviously broken on multicore. (comes up a lot in Apple discussions, as you know)
First plot really says it all.
And like the other poster mentioned, it correlates well with SPEC, so it's basically a easily accessible SPEC. These days the only benchmark I use to quickly judge some CPU is geekbench.
Have you not heard of C-x M-c M-butterfly?
how good is the M5 Max in comparison to a 96-core threadripper? what's the tl;dr, where are the broader assortments of benchmarks
I just want to see some bargraphs that say "lower is better" or "higher is better"
> sudo-rs Breaks Historical Norms With Now Enabling Password Feedback By Default
"Let's tell you exactly how to feel about this, commenters"
> Linux 7.0 Officially Concluding The Rust Experiment
"...by declaring it successful, but if we said that in the headline it wouldn't be clickbait"
(In fairness, LWN did this too, but by accident rather than to provoke clicks.)
> GNOME Mutter Now "Completely Drops The Whole X11 Backend"
> systemd 260 Dropping System V Service Script Support
"And has dozens of other features and improvements, but let's cherry-pick the one we want you to yell about in our comments section."
> The Linux Kernel Looks To 'Bite The Bullet' In Enabling Microsoft C Extensions
"Oh no, look at the poor kernel developers reluctantly dealing with Microsoft extensions...that they deliberately sought out and used because they prefer them over standard C, not being forced by any external factor."
They know their comments section, they know what gets them posting, and they optimize for provoking comment.
My favourites were the comparisons of FreeBSD and Linux coming to the conclusion that FreeBSD is slower. Until you look under the hood and see that both are tested in a configuration with a desktop environment.
Or the good old ZFS tests that were coming up with nonsensical results because of gross misconfiguration and/or total lack of understanding how the FS works...
But hey, the click/ragebait is on point in both these cases!