3 pointsby chrka7 hours ago2 comments

anticleiades7 hours ago
branch-less programming is a fascinating area. you have used -O3. Possibly, the compiler is also vectorizing some parts of the code. I am curious to know the contribution of AVX/SIMD to the speed-up (i.e, how much speed-up avoiding branches "alone" yields)
- chkas7 hours ago
  You can take a look at this - it's fast even without vector operations, as long as you avoid the branches that are often predicted incorrectly.
  https://easylang.online/blog/branchless
jjgreen7 hours ago
In line 423 or the optimised code there's a typo: "sort2(e,i)" should be "sort2(i,e)"
- chkas7 hours ago
  That should give the same result.