42 pointsby Fake4d6 days ago21 comments
  • venning3 days ago
    It may be worth pointing out: these are equivalent comparisons when testing for even numbers but cannot be extrapolated to testing for odd numbers. The reason being that a negative odd number modulus 2 is -1, not 1.

    So `n % 2 == 1` should probably [1] be replaced with `n % 2 != 0`.

    While this may be obvious with experience, if the code says `n % 2 == 0`, then a future developer who is trying to reverse the operation for some reason must know that they need to change the equality operator not the right operand. Whereas, with `n % 1 == 0`, they can change either safely and get the same result.

    This feels problematic because the business logic that necessitated the change may be "do this when odd" and it may feel incorrect to implement "don't do this when even".

    I really disfavor writing code that could be easily misinterpreted and modified in future by less-experienced developers; or maybe just someone (me) who's tired or rushing. For that reason, and the performance one, I try to stick to the bitwise operator.

    [1] Of course, if for some reason you wanted to test for only positive odd numbers, you could use `n % 2 == 1`, but please write a comment noting that you're being clever.

    • userbinator3 days ago
      I really disfavor writing code that could be easily misinterpreted and modified in future by less-experienced developers

      That's their problem. Otherwise you're just contributing to the decline.

    • notfish3 days ago
      In what languages is n % 2 -1 for negative odd numbers?

      Edit: apparently JS, java, and C all do this. That’s horrifying

      • jagged-chisel3 days ago
        Horrifying? It’s mathematically correct.
        • seritools2 days ago
          it's a semantics problem, not a maths problem - modulus and remainder are not the same operation. This easily trips up people since `%` is often called "modulo", yet is implemented as remainder operation in many languages

          https://stackoverflow.com/questions/13683563/whats-the-diffe...

        • michael19992 days ago
          It's actually really awkward. Math usually considers (-7 mod 5) === (2 mod 5). But in C, (-7 % 5 != 2 % 5).
        • affinepplan2 days ago
          wrong. it's not any more correct than 1. that's the key part of an "equivalence" class is that the elements are "equivalent"
    • neuroelectron3 days ago
      You can just not( iseven() )
  • tux36 days ago
    Optimizing compilers have been able to recognize pretty complicated patterns for many years.

    For instance if you're making a loop to count the bits that are set in a number, the compiler can recognize the entire loop and turn it into a single popcnt instruction (e.g. https://lemire.me/blog/2016/05/23/the-surprising-cleverness-... )

    • eichin3 days ago
      They've been able to recognize this pattern since the 1960s (popcount is a very historically special case and not really a sign of complexity, since traditionally it was "if you write this exact code from the documentation, you'll get the machine instruction" and didn't imply any more general cleverness.)
      • cperciva3 days ago
        popcount is a very historically special case

        To elaborate a bit on the specialness of popcount: It is a generally accepted belief in the computer architecture community that several systems included a popcount iNstruction Solely due to A request from a single "very good customer".

      • LegionMammal9783 days ago
        Yeah, once I tried writing out a dumber popcount loop (checking each bit) for better readability, and was annoyed to find that GCC and Clang didn't recognize it. I ended up looking into the source of both compilers to find that they only transform the more 'idiomatic' version.
    • dietr1ch3 days ago
      I feel that the compiler is doing too much work here. I know they are thinking about special cases on generated code, but at some point it feels that it just adds compile time for no good reason.

      Look at this --beauty-- eww, thing, should compilers really spend time trying to figure out how to optimise insane code?

          def is_even(n):
            return str(n)[len(str(n))-1] in [str(2*n) for n in range(5)]
      • ajross3 days ago
        These optimizations are very useful. Consider the only slightly less contrived case where you want to mod an index by the size of an array. And the compiler expands the inline function around a context where the array is a fixed power of two size at compile time. Poof, no division/modulus needed, magically. Lots and lots of code looks like this: general algorithms expressed in simple implementation that has a faster implementation in the specific instance that gets generated.
      • refulgentis3 days ago
        Maybe one day there will be compilers that can choose what to optimize based on their aesthetic judgement of the code.

        I could see that as a novel feedback mechanism for software engineers.

        As it stands, I'm glad they design optimizations abstractly, even if that means code I don't like gets the benefits

        • dietr1ch3 days ago
          It's not about aesthetics, but about the sort of hit-rate of the optimisations as if they need to be too smart to figure things out, then it also means that they'd more rarely be used and necessary.
          • refulgentis3 days ago
            I'm not quite sure what you're visualizing for compilers, if I understand correctly, what I'd say is:

            tl;dr: there are general optimizations for "this function in a for loop is a constant expression, we dont need to call it 500 times"

            or

            "this obscure combination of asm instructions is optimal on pentium iii 350 mhz dual core"

            not "we need to turn this unholy CS101 student spaghetti code where they do a 500 branch-if into a for loop"

            comment over here is attempting to communicate that as well https://news.ycombinator.com/item?id=42705758

            I've never, ever, heard the idea that compilers are burdened by the workload of maintaining thousands of type-specific optimizations for hilariously bad code, until today. I've been here since 2009, so it is puzzling to me to see it referred to off hand, in a "this is water" manner https://en.wikipedia.org/wiki/This_Is_Water

            • dietr1ch3 days ago
              > I've never, ever, heard the idea that compilers are burdened by the workload of maintaining thousands of type-specific optimizations for hilariously bad code, until today.

              I've heard tons of people complain about slow compilers, so even if compiler devs find it easy architect their compilers to do multiple kinds of optimisations there's a cost to it that devs running the compilers pay.

              Also, if you think about it, optimising code has to follow diminishing returns, so at some point we are putting too much CPU time into little to no gains, and it's also possible to get slower code with more optimisations if they interact poorly, or at least not better code even if spending more CPU time. This is why there's -O3 in gcc and it's not the default, there's a cost to it that's likely not worth paying.

              • refulgentis3 days ago
                > I've heard tons of people complain about slow compilers,

                A slow compiler does not imply the compiler is slow because there's thousands of bespoke optimizations for nonsense code being ran

                > Also, if you think about it, optimising code has to follow diminishing returns,

                Nope, trivially. Though, I'm always eager for a Fermat-style marvelous proof that may have been too big for the initial margin you had. :)

                Take a classic case of a buggy compiler generating O(n²) temporary copies due to missed alias analysis. One optimization pass to fix that analysis transforms it to O(n).

                > at some point we are putting too much CPU time into little to no gains

                It is theoretically possible to design a compiler such that it spends much more time looking for optimizations that the total sum of looking is greater than the program it is optimizing's runtime.

                For example, an optimizer that is a while loop that checks if the function returns 42, but the function returns 43.

                I'm not sure what light that sheds.

                I'm not sure that implies that compilers have tons of bespoke optimizations for hand-transforming specific instances of absurd string code.

                If they do, I would be additionally surprised because I have never observed that. What I have observed is compilers, universally, optimize code structures of a certain general form

                > This is why there's -O3 in gcc and it's not the default, there's a cost to it that's likely not worth paying.

                The existence of an argument with a higher processing level than default does not imply the compiler is slow because there's thousands of bespoke optimizations for nonsense code being ran. (n.b. -O3 is understood, in practice, to be risky because it might be too aggressive, not that it might not be worth it)

      • matt_daemon3 days ago
        Or even:

            def is_even(n):
                return str(n)[-1] in "02468"
  • dansalvato3 days ago
    The interesting thing about testing values (like testing whether a number is even) is that at the assembly level, the CPU sets flags when the arithmetic happens, rather than needing a separate "compare" instruction.

    gcc likes to use `and edi,1` (logical AND between 32-bit edi register and 1). Meanwhile, clang uses `test dil,1` which is similar, except the result isn't stored back in the register, which isn't relevant in my test case (it could be relevant if you want to return an integer value based on the results of the test).

    After the logical AND happens, the CPU's ZF (zero) flag is set if the result is zero, and cleared if the result is not zero. You'd then use `jne` (jump if not equal) or maybe `cmovne` (conditional move - move register if not equal). Note again that there is no explicit comparison instruction. If you don't use O3, the compiler does produce an explicit `cmp` instruction, but it's redundant.

    Now, the question is: Which is more efficient, gcc's `and edi,1` or clang's `test dil,1`? The `dil` register was added for x64; it's the same register as `edi` but only the lower 8 bits. I figured `dil` would be more efficient for this reason, because the `1` operand is implied to be 8 bits and not 32 bits. However, `and edi,1` encodes to 3 bytes while `test dil,1` encodes to 4 bytes. I guess the `and` instruction lets you specify the bit size of the operand regardless of the register size.

    There is one more option, which neither compiler used: `shr edi,1` will perform a right shift on EDI, which sets the CF (carry) flag if a 1 is shifted out. That instruction only encodes to 2 bytes, so size-wise it's the most efficient.

    The right-shift option fascinates me, because I don't think there's really a C representation of "get the bit that was right-shifted out". Both gcc and clang compile `(i >> 1) << 1 == i` the same as `i & 1 == 0` and `i % 2 == 0`.

    Which of the above is most efficient on CPU cycles? Who knows, there are too many layers of abstraction nowadays to have a definitive answer without benchmarking for a specific use case.

    I code a lot of Motorola 68000 assembly. On m68k, shifting right by 1 and performing a logical AND both take 8 CPU cycles. But the right-shift is 2 bytes smaller, because it doesn't need an extra 16 bits for the operand. That makes a difference on Amiga, because (other than size) the DMA might be shared with other chips, so you're saving yourself a memory read that could stall the CPU while it's waiting its turn. Therefore, at least on m68k, shifting right is the fastest way to test if a value is even.

    • userbinator3 days ago
      That instruction only encodes to 2 bytes, so size-wise it's the most efficient.

      In isolation it's the smallest, but it's no longer the smallest if you consider that the value, which in this example is the loop counter, needs to be preserved, meaning you'll need at least 2 bytes for another mov to make a copy. With test, the value doesn't get modified.

      • dansalvato3 days ago
        That is true, I deliberately set up an isolated scenario to do these fun theoretical tests. It actually took some effort to stop the compiler from being too smart, because it would want to transform the result into a return value, or even into a pointer offset, to avoid branching.
    • amiga3863 days ago
      > On m68k, shifting right by 1 and performing a logical AND both take 8 CPU cycles. But the right-shift is 2 bytes smaller

      There's also BTST #0,xx but it wastefully needs an extra 16 bits say which bit to test (even though the bit can only be from 0-31)

      > That makes a difference on Amiga, because (other than size) the DMA might be shared with other chips, so you're saving yourself a memory read that could stall the CPU while it's waiting its turn.

      That's a load-bearing "could". If the 68000 has to read/write chip RAM, it gets the even cycles while the custom chips get odd cycles, so it doesn't even notice (unless you're doing something that steals even cycles from the CPU, e.g. the blitter is active and you set BLTPRI, or you have 5+ bitplanes in lowres or 3+ bitplanes in highres)

      • dansalvato3 days ago
        > There's also BTST #0,xx but it wastefully needs an extra 16 bits say which bit to test (even though the bit can only be from 0-31)

        That reminds me, it's theoretically fastest to do `and d1,d0` e.g. in a loop if d1 is pre-loaded with the value (4 cycles and 1 read). `btst d1,d0` is 6 cycles and 1 read.

        > the blitter is active and you set BLTPRI

        I thought BLTPRI enabled meant the blitter takes every even DMA cycle it needs, and when disabled it gives the CPU 1 in every 4 even DMA cycles. But yes, I'm splitting hairs a bit when it comes to DMA performance because I code game/demo stuff targeting stock A500, meaning one of those cases (blitter running or 5+ bitplanes enabled) is very likely to be true.

        • amiga3863 days ago
          > it's theoretically fastest to do `and d1,d0` e.g. in a loop

          That's true, although I'd add that ASR/AND are destructive while BTST would be nondestructive, but we're pretty far down a chain of hypotheticals at this point (why would someone even need to test evenness in a loop, when they could unroll the loop to doing 2/4/6/8 items at a time with even/odd behaviour baked in)

          > I thought BLTPRI enabled meant the blitter takes every even DMA cycle it needs, and when disabled it gives the CPU 1 in every 4 even DMA cycles

          Yes, that is true: https://amigadev.elowar.com/read/ADCD_2.1/Hardware_Manual_gu... "If given the chance, the blitter would steal every available Chip memory cycle [...] If DMAF_BLITHOG is a 1, the blitter will keep the bus for every available Chip memory cycle [...] If DMAF_BLITHOG is a 0, the DMA manager will monitor the 68000 cycle requests. If the 68000 is unsatisfied for three consecutive memory cycles, the blitter will release the bus for one cycle."

          > one of those cases is very likely to be true

          It blew my mind when I realised this is probably why Workbench is 4 colours by default. If it were 8, an unexpanded Amiga would seem a lot slower to application/productivity users.

  • Arcuru6 days ago
    > Much better :) But what about C? Let’s try it:

    > I tried both versions (modulo 2 and bitwise AND) and got the same result. I think the optimizer recognizes modulo 2 and converts it to bitwise AND.

    Yes, even without specifying optimizations - https://godbolt.org/z/9se9c6qKT

    You can see that the output of the compiler is identical whether you use `i%2 == 0` or `(i&1) == 0`. The bitwise AND is instruction 12 in the output.

    Using -O3 like in the post actually compiles to SIMD instructions on x86-64 - https://godbolt.org/z/dWbcK947G

    • ryan-c3 days ago
      With i < 71, the compiler will just turn it into a constant value of 36. Switches to SIMD at 72, idk why.
    • thaumasiotes3 days ago
      That's how you check modulus for powers of 2. 2 is a power of 2. This barely even qualifies as an "optimization".
  • WesolyKubeczek6 days ago
    > I think the optimizer recognizes modulo 2 and converts it to bitwise AND.

    A quick check in the compiler explorer (godbolt.org) confirms that this is indeed true for GCC on x86_64 and aarch64, but not for clang on the same (clang does optimize it with -O3).

  • nikolay3 days ago
    "Premature optimization is the root of all evil." -- Donald Knuth [0]

    [0]: https://www.youtube.com/watch?v=74RdET79q40

    • osigurdson3 days ago
      The real ROAE is reasoning by unexamined phrase.
  • Rendello3 days ago
    Why write redundant code? I just depend on an external is-even library, which depends on an is-odd library ;)

    (https://news.ycombinator.com/item?id=38791094)

  • the_real_cher6 days ago
    I like the algorithms that are the opposite of this where they try to find the slowest possible way to determine if a number is even.
    • mtmail6 days ago
      https://github.com/blackburn32/serverlessIsEven "A serverless implementation of isEven. Now you can know if your numbers are even, even at mass scale."
      • TZubiri3 days ago
        How about sending a packet back and forth to a server in another continent n times, and if it stops coming back, it was odd.
        • MathMonkeyMan3 days ago
          Better to use TCP, but I like your approach.
          • belter3 days ago
            - Attempt to factor your integer n into primes...

            - Once you have the complete prime factorization, check whether 2 is among its prime factors...

            - If 2 is a factor, it’s even; if not, odd.

            • TZubiri2 days ago
              It's just 2 lines of code, therefore it's fast.

              Also I only use the step over command in the debugger

    • crazydoggers3 days ago
      There’s no upper limit, so there’d have to be some set of rules like no sleep(n).

      Also given the halting problem, you could write an algorithm that would be impossible to determine if it loops forever.

      • plagiarist3 days ago
        If the results of TREE(n) had specific properties for even n, you could easily check by first calculating TREE(n) and then looking for those properties in the results. Might need a bignum library.
    • 6 days ago
      undefined
    • 6 days ago
      undefined
  • BeefWellington3 days ago
    I did some testing and even in python both approaches are nearly identical in speed:

        def isEven_modulus(num):
            return num % 2 == 0
    
        def isEven_bit(num):
            return (num & 1) == 0
    
        import random, time
    
        testSet = random.choices(range(0, 100), k=10)
        iterations = range(1000)
    
        print("These are our test numbers: ", testSet)
    
        mod_start = time.perf_counter_ns()
        for z in iterations:
            for n in testSet:
                isEven_modulus(n)
                
        mod_end = time.perf_counter_ns()
        for z in iterations:
            for n in testSet:
                isEven_bit(n)
    
        bit_end = time.perf_counter_ns()
    
        print("Modulus method: ", mod_end - mod_start, "ns")
        print("Bitwise method: ", bit_end - mod_end, "ns")
    
    
    There's some variance run to run but for the most part they're close enough to not matter. I do see a very small difference generally in favour of bitwise, but we're talking about a 60000ns (0.06ms) difference occasionally on 1000 runs (or about 60ns per run). Unlikely that this will be a significant bottleneck for anyone.

    An example:

        These are our test numbers:  [45, 88, 55, 52, 40, 70, 62, 47, 78, 30]
        Modulus method:  757341 ns
        Bitwise method:  698872 ns
    
    Possibly just a well understood and well-optimized problem.
  • ChrisMarshallNY3 days ago
    When I wrote in ASM, I always used to look at the LSB (Least Significant Bit). If it was zero, then the number was even.

    Things are probably different, these days, so maybe that isn’t effective.

    • ChrisMarshallNY3 days ago
      Just to add some closure. This is how I'd do it in Swift:

          extension FixedWidthInteger { var isEven: Bool { 0 == 1 & self } }
  • maplet6 days ago
    You can actually generalize i % 2 == i & 1 to larger powers of 2.

    Using C Syntax,

    i % (1 << n) == i & ((1 << n) - 1)

    holds for unsigned integer n and (1 << n) is 2 to the power of n.

    • tptacek3 days ago
      ... or just let the compiler do that for you.
  • furyofantares6 days ago
    > I think the optimizer recognizes modulo 2 and converts it to bitwise AND.

    You don't have to guess, you could turn or O3 or look at the disassembly.

    • lcampbell6 days ago
      Interestingly, not only do both versions emit the same assembly, but clang both autovectorizes and unrolls the loop:

      https://godbolt.org/z/qxsKWfz9s

    • 6 days ago
      undefined
    • caspper696 days ago
      gcc does it even when it's not optimizing
  • lifthrasiir3 days ago
    I'm quite surprised that it wasn't recognized by scalar evolution, a common optimization pass to detect induction variables and their relations to other variables. Of course that requires the compiler to reason about `i % 2 == 0` or `(i & 1) == 0` first, but modern compilers do have tons of patterns recognized by that pass...
  • taneq3 days ago
    Does the C compiler optimise out the branch from the if() statement?

    I'd write it more like this:

        int main(int argc, char **argv) {
          int total = 0;
          for (int i=2147483647; i; --i) {
            total += i & 1;
          }
          printf("%d\n", total);
          return 0;
        }
    • AdieuToLogic3 days ago
      > Does the C compiler optimise out the branch from the if() statement?

      In a function as simple as this, the existence of a branch may be as fast or faster than a version without as the CPU has the opportunity to eliminate register/memory modification via branch prediction.

      So even if a compiler does not optimize out this particular if construct, there is a good chance the CPU will.

    • userbinator3 days ago
      You inverted the condition and the loop doesn't go to 0, so that's not functionslly the same code.
  • 6 days ago
    undefined
  • osigurdson3 days ago
    Just test if last bit == 0
  • JJOmeo3 days ago
    Wait until you hear what C compilers try to do if you divide by a constant. Hint: they don't need to use a division instruction in most cases.
  • userbinator3 days ago
    I'm nowhere near an APL-level programmer, but that C example already looks ridiculously verbose to me; the body of the loop could be simply written branchlessly as:

        total += !(i&1);
    
    ...and since there's another comment here about Asm, I'd compile the above as (assume edx is i and total in eax, high 24 bits of ebx precleared):

        test dl, 1
        setnz bl
        add eax, ebx
  • satisfice6 days ago
    That’s not “even” wrong!
  • b5na day ago
    Now try `!(n & 1)`.
  • deathanatos3 days ago
    In Javascript, NaN (not a number) is a number:

      >> typeof NaN
      <- "number"
    
    Let's see then:

      >> (NaN % 2) == 0
      <- false
    
    So clearly NaN is odd. /s

    (And if you're thinking "you gotta equals harder":

      >> (NaN % 2) === 0
      <- false
    
    Nope, still odd.

    Both of the infinities are also odd by the same logic, too, if you were curious.

    null and false are even. true is odd. [] is even, [0] is even, [1] is odd.)