A clickable visual guide to the Rust type system(rustcurious.com)

259 pointsby ashvardanian5 days ago9 comments

craftkiller5 days ago
This is such a small thing, but I love the inclusion of the value ranges for the integers! I can never remember which side can go one deeper ("is it [-128 to 127] or [-127 to 128]"). Bookmarking this for reference later!
- newpavlov4 days ago
  Tangential note: I sometimes wish that signed integers were symmetrical. i8 would represent the range of [-127 to 127] with 0xFF representing NaN. Any operation which can not be computed (division by zero, overflows, operation with another NaN, etc.) would result in NaN. For further symmetry we could do the same for signed integers as well.
  Yes, it's possible to encode such types manually, but it will not be efficient since CPUs do not natively support such operations.
  - lock14 days ago
    Wouldn't this make CPU flags useless? I think it would complicate branch instructions too, as most modern CPUs tend to use integer operations for branching.
    Also, this in-band signaling probably would invite something similar to `null` mess in type systems. I can't wait to tell CPU to JMP NaN.
    newpavlov4 days ago
    >Wouldn't this make CPU flags useless?
    They would, but I agree with RISC-V here, CPUs should not rely on them in the first place.
    I do not understand your argument about branches, how would it hinder the jump instructions?
    We still would need separate "wrapping" instructions (e.g. for implementing bigints and cryptographic algorithms), but they probably could be limited to unsigned operations only.
    >I can't wait to tell CPU to JMP NaN.
    How is it different from jumping to null? If you do such jump, it means you have a huge correctness problem with your code.
    lock14 days ago
    > I do not understand your argument about branches, how would it hinder the jump instructions?
    Extra set of logic for handling NaN cases? I don't think it's impossible, just kind of less intuitive. Jump instruction using integer w/o NaN always valid, while NaN-able integer sometimes invalid (ignoring whether the memory address can be accessed).
    newpavlov4 days ago
    For absolute jumps you don't need extra logic, since CPUs could declare the last page always unmapped, so such jumps would always result in a page fault (similarly to the null page on most systems).
    For relative non-immediate jumps the added logic is extremely simple (hardware exception on NaN) and should not (AFAIK) hinder performance of jumps in any way.
  - zokier4 days ago
    That sounds surprisingly reasonable idea for signeds. Less so for unsigneds though. Has there been any architecture doing anything like that?
    newpavlov4 days ago
    I can not name an ISA with such instructions out of my head.
    As for unsigned integers, as I mentioned in the other comment, we probably need two separate instruction sets for "wrapping" and NaN-able operations on unsigned integers.
- throwawaymaths5 days ago
  It's always negative. 0xFFFF... Cannot have a two's complement, and the top bit is set.
  - delusional5 days ago
    I find that the easiest way to remember it is to remember that 0 is positive but has no negative counterpart.
    high_priest4 days ago
    The 0 is positive is not true, but some day you are hopefully going to get it.
    The true answer is that negative numbers have the top bit set, which can't be used for positive numbers. Hence positives are one bit short.
    delusional4 days ago
    Youre literally saying the same thing as me.
    All negative numbers have the most significant bit set and 0 is the number with no bits set, ergo 0 must be positive since the most significant bit is not set.
    Now arithmatically, this is untrue. We'll usually treat 0 as neither positive nor negative (or in certain cases both negative and positive) but bitwise, In terms of twos-complement implementation, Zero is positive. We know that since it exists in the unsigned version of the types as well.
    Hopefully you'll see that some day.
- jibal4 days ago
  I can't imagine suffering from that. Understanding twos complement representation is an essential programming skill. And a byte value of 128? What is that in hex?
  - dzaima4 days ago
    You could pretty easily have an integer representation using [-127; 128]; 128 being 0x80 of course (all other values being the same as in two's complement). Still would hold that -n == 1 + ~n, zero is all-zeroes, and the property that add/sub needn't care about signed vs unsigned. Only significant difference being that top bit doesn't determine negativeness, though of course it's still "x < 0" in code. (at the hardware level, sign extension & comparisons might also get very slightly more complicated, but that's far outside what typical programmers would need to know)
    For most practical purposes outside of low-level stuff all that really matters about two's complement is Don't Get Near 2^(width-1) Or Bad™ Things Happen. Including +128 would even have the benefit of 1<<7 staying positive.
    moefh4 days ago
    > Only difference being that you need to do a bit more work to determine negativeness (work which in hardware you'd already likely have the bulk of for determining is-zero).
    The work needed to calculate the overflow flag (done in every add/sub operation in most ISAs) is also way more complicated when the high bit does not represent sign.
    dzaima4 days ago
    Oh, true. Even further down low-level/frequently-unused details though; and RISC-V does without it (/ flags in general) roughly fine.
  - AnIrishDuck4 days ago
    > Understanding twos complement representation is an essential programming skill
    The field of programming has become so broad that I would argue the opposite. The vast majority of developers will never need to think about let alone understand twos complement as a numerical representation.
  - wubrr4 days ago
    > Understanding twos complement representation is an essential programming skill.
    It is completely irrelevant for the vast majority of programming.
  - oconnor6634 days ago
    What is your goal for this comment?
  - koakuma-chan4 days ago
    I have no idea what is twos complement representation
    koakuma-chan4 days ago
    It just means the most significant bit represents the sign?
    craftkiller4 days ago
    It's a little bit more complicated than that. If only the most significant bit represented the sign then you'd have both positive and negative zero (which is possible with floats), and you'd only be able to go from [-127 to 127]. Instead, it's some incantation where the MSB is the sign but then you flip all the bits and add 1. It is only relevant for signed integers, not unsigned integers.
    pests4 days ago
    Ben Eater has a really good YT video on this.
    lock14 days ago
    That's called "ones complement", the most significant bit represents a sign. Like the sibling post mentioned, it does have a weird quirk of having 2 representations for 0: (-0) and (+0).
    While "twos complement" turns the MSB unsigned value to a negative instead of a positive. For example, 4-bit twos complement: 1000 represents -8 (in unsigned 4-bit, this supposed to be +8), 0100 represents 4, 0010 represents 2, 0001 represents 1. Some more numbers: 7 (0111), -7 (1001), 1 (0001), -1 (1111).
    Intuitively, "ones complement" MSB represents a multiplication by (-1). While "twos complement" MSB adds (-N), with N = 2^(bit length - 1), in case of 4-bit twos complement it's (-2^3) or (-8). Both representation leave non-MSB bits work exactly like unsigned integer.
    4 days ago
    undefined
    harpiaharpyja4 days ago
    The other replies do a good job of explaining what 2s complement is.
    I find the best way to understand why 2s complement is so desirable is to write down the entire number line for e.g. 3-bit integers.
    Using 1s complement, the negative numbers are backwards. 2s complement fixes this, so that arithmetic works and you can do addition and subtraction without any extra steps.
    (Remember that negative numbers are less than positive numbers, so the correct way to count them is:
    -8 -7 -6 -5 -4 -3 -2 -1 0 +1 +2 +3 +4 +5 +6 +7
    Where -1 is the largest possible negative number)
  - craftkiller4 days ago
    Eh, how often are you going down to the bit representation of signed integers? Naturally I learned two's complement ages ago, but all of my bitwise manipulation seems to be on unsigned integers (and frankly I've only used bitwise operations at work once for implementing bloom filters. Normally I only get to do lower level stuff like that in side-projects). So internalizing two's complement has never seemed relevant.
    > And a byte value of 128? What is that in hex?
    0x80
    jibal4 days ago
    > 0x80
    Which is of course has the sign bit set.
    The comments here are educational ... I hadn't realized that the field of programming had become this degraded.
    jibal3 days ago
    > Such needless condescension, jibal.
    THAT comment is condescending--talk about ideas, not people. I condescended to no one ... my issue is the state of computer science education.
    > On the contrary, there is no sign bit. You asked for 128
    I didn't ask for anything. The subject here was the value range of the i8 type.
    craftkiller2 days ago
    > I didn't ask for anything.
    To quote you: "What is that in hex?"
    craftkiller3 days ago
    On the contrary, there is no sign bit. You asked for 128 which is either:
    1. Unrepresentable in S8 2. Representable in U8 as shown in my above comment. 3. Representable in S16 as 0x0080 but then the sign bit is not set.
    My comment accurately represents the hex encoding for #2.
    superblas4 days ago
    Such needless condescension, jibal.
goku125 days ago
Adding another resource I use frequently: https://cheats.rs/
One part that I love especially about it is that it represents lifetimes [1] and memory layout [2] of data structures in graphical format. They're as invaluable as API references. I would love to see it included in other documentation as well.
[1] https://cheats.rs/#memory-lifetimes
[2] https://cheats.rs/#memory-layout
adastra224 days ago
Why is PhantomData in the unsafe support group?
- john-h-k4 days ago
  It obviously can be used for other things but it principally was designed for unsafe support (allowing dropck to understand unsafe types that own a value through a pointer). See https://doc.rust-lang.org/nomicon/phantom-data.html
  - saghm4 days ago
    Interesting, I've had to use it a number of times over the years despite never really doing much unsafe. At least to me, it seems pretty well-scoped as a workaround from the requirements that the compiler has around needing to use generic type parameters in type definitions, which certainly isn't something you need to be writing unsafe code to run into. I wouldn't be shocked if it used unsafe under the hood, but then again, so does Vec.
    afdbcreid4 days ago
    The original reason to design it (instead of the previously inferred bivariance) was so that unsafe code that really does not want bivariance, and will be unsound if it will be used, will remember to consider that.
    It doesn't use unsafe under the hood, rather it's compiler magic.
    john-h-k4 days ago
    > At least to me, it seems pretty well-scoped as a workaround from the requirements that the compiler has around needing to use generic type parameters in type definitions
    The reason those requirements exist is (primarily) to do with unsafe code. Specifically it’s about deciding the variance of the type (which doesn’t matter for a truely unused type parameter).
stmw4 days ago
It's very good, thanks for getting it some attention. Also - to show how much I agree - https://news.ycombinator.com/item?id=45140572
smj-edison5 days ago
I really like how it scrolls left-to-right on mobile, instead of collapsing down.
mattlutze4 days ago
I love a page that doesn't react to my browser width.
shmerl4 days ago
Really nice and concise presentation!
6r175 days ago
There aren't that much of them actually ! Almost feel like an element table
wiredpancake4 days ago
[dead]