3 pointsby chrka7 hours ago2 comments
  • anticleiades7 hours ago
    branch-less programming is a fascinating area. you have used -O3. Possibly, the compiler is also vectorizing some parts of the code. I am curious to know the contribution of AVX/SIMD to the speed-up (i.e, how much speed-up avoiding branches "alone" yields)
  • jjgreen7 hours ago
    In line 423 or the optimised code there's a typo: "sort2(e,i)" should be "sort2(i,e)"
    • chkas7 hours ago
      That should give the same result.