The author should read https://webkit.org/blog/6161/locking-in-webkit/ so that they understand what they are talking about.
WebKit does it right in the sense that:
- It as an optimal amount of spinning
- Threads wait (instead of spinning) if the lock is not available immediately-ish
And we know that the algorithms are optimal based on rigorous experiments.
The only time I've manually written my own spin lock was when I had to coordinate between two different threads, one of which was running 16-bit code, so using any library was out of the question, and even relying on syscalls was sketchy because making sure the 16-bit code is in the right state to call a syscall itself is tricky. Although in this case, since I didn't need to care about things like fairness (only two threads are involved), the spinlock core ended up being simple:
"thunk_spin:",
"xchg cx, es:[{in_rv}]",
"test cx, cx",
"jnz thunk_has_data",
"pause",
"jmp thunk_spin",
"thunk_has_data:",The following is an interesting talk where the author used a custom spinlock to significantly speed up a real-time physics solver.
Dennis Gustafsson – Parallelizing the physics solver – BSC 2025 https://www.youtube.com/watch?v=Kvsvd67XUKw
rdtsc may execute out of order, so sometimes an lfence (previously cpuid) can be used and there is also rdtscp
See https://github.com/torvalds/linux/blob/master/arch/x86/inclu...
And just because rdtsc is constant doesn't mean the processor clock will be constant that could be fluctuating.
Stealing is an example of an unfairness which can significantly improve overall performance.
I suppose someplace someone is running an embedded system without an OS on such a processor - but I'd expect they are still using extra cores and so have all of the above tricks someplace.
It's a weak signal but the reasoning is sound.
Code has a minimum complexity to solve the problem