3 pointsby RetroTechie5 hours ago2 comments
  • gblargg5 hours ago
    I always liked rlwimi on PowerPC. It rotates the source n bits, then writes any contiguous section of bits over the corresponding bits in the destination register. This allows copying any bitfield from any position in one register into another. Basically either of these:

      out = (out & ~mask) | (in << shift & mask)
      out = (out & ~mask) | (in >> shift & mask)
    
    Z80's EXX to swap with the shadow registers was interesting (meant for fast interrupt response so you didn't have to save registers to memory).
    • pulvinar3 hours ago
      rlwimi was a nice one, especially for emulators.

      And it also had eieio, Enforce In-Order Execution of I/O.

    • brucehoult3 hours ago
      > rlwimi / rlwinm

      Definitely a nice and pretty much pioneering feature on PowerPC in 1994 (and I guess RS/6000 before that, but I never used one).

      Today's Arm64 BFM does both those jobs in one, minus the ability to create a split mask via rotating, but plus adding a choice of sign or zero extension to extracted fields (including extracted to the same place they already were, for pure sign/zero extension). As a result it's got about 100 aliases.

      It would be nice to have these in RISC-V but they seriously violate the quite strict "Stanford Standard RISC" 2R1W principle that keeps the RISC-V integer pipeline simple (smaller, faster, cheaper).

      When working in the "B" extension working group I suggested adopting the M88000 bitfield instructions which follow the 2R1W principle. Someone had an objection to encoding both field width and offset into a single constant (or `Rs2`), though I think it's well worth it. M88k as a 32 bit ISA used 5 bits for each, but 6 bits for each for RV64 fits RISC-V's 12 bit immediates perfectly.

      - ext / extu: Extract signed or unsigned bit field from a register. You specify offset (starting bit position) and width. The extracted field is right-justified (shifted to the low bits) in the destination, with sign-extension or zero-extension.

      - mak: Make (insert) a bit field. Takes a value, shifts it left by the offset, and inserts it into the destination while clearing the target field first (or combining in specific ways).

      - set: Set (force to 1) a contiguous bit field in a register.

      - clr: Clear (force to 0) a contiguous bit field in a register.

      All take `Rd`, `Rs1` and a field size:offset as either a literal or as `Rs2`.

      Unfortunately, the R-type `mak` violates 2R1W because the `Rd` is also a source, which complicates OoO implementations making them 3R1W. RISC-V could use an alternative formulation in which `mak` (or some other name` masks off the source field and shifts it into place, and then the insert is completed using `clr` and `or`.

      On the other hand the forms with 12 bit literals are expensive in encoding space, but even including just the `Rs2` versions would be great, especially as often several instructions in a row can use the same field specification, which fits `addi Rd,zero,imm12` (aka `li`) perfectly.

      On the gripping hand, while the immediate version of `mak` violates RISC-V convention by making the `Rd` also a source, any real pipeline is going to have fields for all of `Rd`, `Rs1`, `Rs2`, and `imm32` so only the decoder is affected.

      Also, `ext` / `extu` are not needed as a pair of C-extension shifts do the same job with the same code size, and can be decoded into a single µop on a higher end CPU if desired.

      As an example: take a 10 bit field at offset 21 and insert into a destination at offset 1 (this is part of decoding RISC-V J/JAL instructions).

      PowerPC:

          rlwimi  r4, r3, 11, 1, 10
      
      Arm64:

          ubfx   x2, x0, #21, #10      # extract bits[30:21] → low 10 bits of x2 (unsigned)
          bfi    x1, x2, #1, #10       # insert those 10 bits into x1 starting at bit 1
      
      Alternatively, using `bfm` directly without aliases (exactly the same instructions, just trickier to get right)

          bfm    x2, x0, #21, #30
          bfm    x1, x2, #63-1, #9
      
      
      M88k:

          extu   r3, r1, 21, 10        # extract 10-bit field starting at bit 21 → low bits of r3
          mak    r2, r3, 1, 10         # make/insert the field at bit 1 in destination
      
      RISC-V:

          srli   x12, x10, 21          # shift field down to low bits
          andi   x12, x12, 0x3FF       # mask to 10 bits
          slli   x12, x12, 1           # position at bit 1 (for imm[10:1])
          li     x13, ~0x7FE           # mask to clear bits [10:1] only
          and    x11, x11, x13
          or     x11, x11, x12         # insert the field
      
      RISC-V with some M88k inspiration:

          extui  r3, r1, 21, 10        # extract 10-bit field starting at bit 21 → low bits of r3
          maki   r4, r3, 1, 10         # modified mak: masks + shifts field to bits [10:1] (others 0)
          clri   r2, 1, 10             # clear the target field in destination
          or     r2, r2, r4            # insert the prepared field
      
      Alternatively

          li     t0, (1<<6) | 10       # specification for insertion bit field
          srli   a3, a1, 21            # shift 10-bit field starting at bit 21 → low bits of r3
          mak    a4, a3, t0            # modified mak: masks + shifts field to bits [10:1] (others 0)
          clr    a2, t0                # clear the target field in destination
          or     a2, a2, r4            # insert the prepared field
      
      Alternatively:

          srli   a3, a1, 21
          maki   a2, a3, (1<<6) | 10   # decoder expands to `maki a2, a2, a3, (1<<6) | 10`
      
      Again, this last formulation of `maki` violates RISC-V instruction format convention in making `a2` both src and dst, BUT if the decoder handles that then the expanded form does NOT cause any issues with the pipeline implementation.
  • absynth5 hours ago
    HCF - Halt and Catch Fire.