118 pointsby nand2mario3 days ago6 comments
  • dsign14 hours ago
    I've wondered for a long time if we would have been able to make do without protected mode (or hardware protection in general) if user code was verified/compiled at load, e.g. the way the JVM or .NET do it...Could the shift on transistor budget have been used to offset any performance losses?
    • st_goliath14 hours ago
      Microsoft Research had an experimental OS project at one point that does just that with everything running in ring 0 in the same address space:

      https://en.wikipedia.org/wiki/Singularity_(operating_system)

      Managed code, the properties of their C# derived programming language, static analysis and verification were used rather than hardware exception handling.

      • avadodin14 hours ago
        Fil-C vs CHERI vs SeL4 vs YOLO

        I think hardware protection is usually easier to sell but it isn't when it is slower or more expensive than the alternative.

        • Joker_vD11 hours ago
          "Operating System Principles" (1973) by Per Brinch Hansen. A full microkernel OS (remake of RC-4000 from 1967) written in a concurrent dialect of Pascal, that also manages to make do without hardware protection support.
      • alnwlsn11 hours ago
        I think TempleOS also worked like this, though its certainly better known for its "other" features.

        edit: I missed it was linked on the above page

        • rep_lodsb9 hours ago
          In TempleOS, everything runs in ring 0, but that's not the same as doing protection in software (which would require disallowing any native code not produced by some trusted translator). It simply means there's no protection at all.
          • ASalazarMX5 hours ago
            Very fitting if that was intended to be protection by faith.
    • rwmj13 hours ago
      I think the interesting thing about having protection in software is you can do things differently, and possibly better. Computers of yesteryear had protection at the individual object level (eg https://en.wikipedia.org/wiki/Burroughs_Large_Systems). This was too expensive to do in 1970s hardware and so performance sucked. Maybe it could be done in software better with more modern optimizing compilers and perhaps a few bits of hardware acceleration here and there? There's definitely an interesting research project to be done.
      • mananaysiempre10 hours ago
        Sadly, even software-filled TLBs look to be a thing of the past. Apparently a hardware page-table walker is just that much faster? I’m not sure.
        • rep_lodsb9 hours ago
          Why is that surprising? The trap into kernel mode alone would already take more cycles than dedicated hardware needs for the full page table walk.
          • rwmj7 hours ago
            Since we're talking about defining our own processor, that means we need to define one with cheaper traps.

            Expanding on what I wrote above about "bits of hardware acceleration", maybe adding a few primitives to the instruction set that make page table walking easier would help.

            And with a trusted compiler architecture you don't need to keep the ISA stable between iterations, since it's assumed that all code gets compiled at the last minute for the current ISA.

            Lots of fun things to experiment with.

            • saltcured3 hours ago
              Taking this to an extreme, the whole idea of a TLB sounds like hardware protection too?

              As a thought experiment, imagine an extremely simple ISA and memory interface where you would do address translation or even cache management in software if you needed it... the different cache tiers could just be different NUMA zones that you manage yourself.

              You might end up with something that looks more like a GPU or super-ultra-hyper-threading to get throughput masking the latency of software-defined memory addressing and caching?

    • rwallace9 hours ago
      I looked into that, concluded the spoiler is Specter.

      Basically, you have to have out of order/speculative execution if you ultimately want the best performance on general/integer workloads. And once you have that, timing information is going to leak from one process into another, and that timing information can be used to infer the contents of memory. As far as I can see, there is no way to block this in software. No substitute for the CPU knowing 'that page should not be accessible to this process, activate timing leak mitigation'.

      • zozbot2349 hours ago
        OTOH, out of order/speculative execution only amounts to information disclosure. And general purpose OS's (without mandatory access control or multilevel security, which are of mere academic interest) were never designed to protect against that.

        A far greater problem is that until very recently, practical memory safety required the use of inefficient GC. Even a largely memory-safe language like Rust actually requires runtime memory protection unless stack depth requirements can be fully determined at compile time (which they generally can't, especially if separately-provided program modules are involved).

  • 4j452j45nj15 hours ago
    ah, PDE/PTE A/D writes... what a source of variety over the decades!

    some chips set them step by step, as shown in the article

    others only set them at them very end, together

    and then there are chips which follow the read-modify-write op with another read, to check if the RMW succeeded... which promptly causes them to hang hard when the page tables live in read-only memory i.e. ROM... fun fun fun!

    as for segmentation fun... think about CS always being writeable in real mode... even though the access rights only have a R but no W bit for it...

    • rep_lodsb9 hours ago
      That's because CS in real/V86 mode is actually a writable data segment. Most protection checks work exactly the same in any mode, but the "is this a code segment?" check is only done when CS is loaded in protected mode, and not on any subsequent code fetch.

      Using a non-standard mechanism of loading CS (LOADALL or RSM), it's possible to have a writable CS in protected mode too, at least on these older processors.

      There's actually a slight difference in the access rights byte that gets loaded into the hidden part of a segment register (aka "descriptor cache") between real and protected mode. I first noticed this on the 80286, and it looks to be the same on the 386:

      - In protected mode, the byte always matches that from the GDT/LDT entry: bit 4 (code/data segment vs. system) must be set, the segment load instruction won't allow otherwise, bit 0 (accessed) is set automatically (and written back to memory).

      - In real and V86 mode, both of these bits are clear. So in V86 mode the value is 0xE2 instead of the "correct" 0xF3 for a ring 3 data segment, and similarly in real mode it's 0x82 (ring 0).

      The hardware seems to simply ignore these bits, but they still exist in the register, unlike other "useless" bits. For example, LDT only has bit 7 (present), and GDT/IDT/TSS have no access rights byte at all - they're always assumed to be present, and the access rights byte reads as 0xFF. At least on the 286 that was the case, I've read that on the Pentium you can even mark GDT as not-present, and then get a triple fault on any access to it.

      Keeping these bits, and having them different between modes might have been an intentional choice, making it possible to determine (by ICE monitor software) in what mode a segment got loaded. Maybe even the two other possible combinations (where bit4 != bit0) have some use to mark a "special" segment type that is never set by hardware?

  • inigyou13 hours ago
    Interesting to see how hardware designers of yesteryear did things, and why CPUs are so complicated and have so many bugs.
  • jejgkgkldl16 hours ago
    Article states that win 3.0 used 32-bit flat addressing mode, but when win 95 launched ms said win 3.0 didn’t (in 386 mode).
    • shakna16 hours ago
      Pretty sure Enhanced Mode, that only came later in Windows 3.11 for Workgroup, is the one that supported the flat addressing mode.
      • joakleaf14 hours ago
        Enhanced mode was already in 3.0 (and I think allowed for flat addressing)

        However, Win32s was introduced in 3.11 which a subset of the Windows 32-bit API from NT.

        3.11 also introduced 32-bit disk access and 32-bit drivers.

        Microsoft did 32-bit in steps -- it was confusing already back then.

        • lizknope9 hours ago
          I remember I started my internship in June 1995. We were doing stuff with this brand new thing called the World Wide Web.

          They gave us a win3.1 computer and Spyglass Mosaic which required the Win32s susbsystem.

          http://www.win3x.org/win3board/viewtopic.php?t=4971&view=min

          The full time guys all had a Sun on their desk next to their PC. We also had to run an IBM 3270 terminal emulator and X server to connect to the Suns. It was all so unstable. I rememember a bunch of "Win32s error" popups.

          The other intern and I found a room full of decommissioned 486 machines, installed Linux and didn't tell anyone for a month. Everything worked great and then we started an assembly line of installing Linux on those old machines for all the older coworkers to take home.

        • dspillett12 hours ago
          > 3.11 also introduced 32-bit disk access and 32-bit drivers.

          IIRC a lot of it wasn't turned on by default due to hardware/driver compatability concerns, and there were articles all over the place about how to turn it on for extra performance. Essentially they used optimising tech-heads the world over as a giant beta-test group for parts of Win95's IO subsystem.

      • vasvir15 hours ago
        yep that's my recollection too
    • this-is-why16 hours ago
      It used segmented 32-bit mode. Flat mode doesn’t support virtual addressing which was accomplished with the descriptor tables (and the ES register) if I recall correctly. lol it’s been 33 years since I wrote windows drivers. Had to use masm to compile the 16-bit segments to thunk to the kernel
  • icanhasjonas17 hours ago
    Made me think of the old Desqview
  • fortran7713 hours ago
    > These features made possible Windows 3.0, OS/2, and early Linux.

    And also--before Linux--SCO Xenix and then SCO Unix. It was finally possible to run a real Unix on a desktop or home PC. A real game changer. I paid big $$$ (for me at the time) to get SCO Xenix for my 386 so I could have my own Unix system.

    • whobre10 hours ago
      Xenix 2.1 could run on the IBM PC XT with Intel 8088 in late 1983, IIRC, and even before that on the Altos 586 which had MMU as an external chip.
      • spijdar7 hours ago
        For that matter, the "second" version of UNIX ran on a PDP-11/20 with no memory protection or MMU, and there were a few versions after intended to run on similar hardware (LSX, MINI-UNIX).

        The PDP-11's MMU option was closer to the 8088's segmentation model I think, but I've never coded either, so dunno really. It does seem like it was possible to port "PDP-11 UNIX" to a lot more platforms than would get "VMUNIX".

    • classichasclass5 hours ago
      Don't forget Venix. It was the first true Unix that could run on a stock IBM PC, and beat Xenix on that platform by months.