664 pointsby wicket5 days ago25 comments
  • gwern5 days ago
    > The MPV patch of v0.1 is without a doubt build 36 (e16bab8). The "Cripy optimization" turns status bar percentage rendering into a noop if they have not changed. This prevents rendering to a scrap buffer and blitting to the screen for a total of 2 fps boost. At first I could not believe it. I assume my toolchain had a bug. But cherry-picking this patch on PCDOOMv2 confirmed the tremendous speed gain.

    Good example of how the bottlenecks are often not where you think they are, and why you have to profile & measure (which I assume Viti95 did in order to find that speedup so early on). The status bar percentage?! Maybe there's something about the Doom arch which makes that relatively obvious to experts, but I certainly would've never guessed that was a bottleneck a priori.

    • robocat5 days ago
      Example: "Our app was mysteriously using 60% CPU and 25% GPU. It turned out this was due to a tiny CSS animation [of an equaliser icon]"

      https://www.granola.ai/blog/dont-animate-height

      • rplnt5 days ago
        I remember Slack eating up my CPU when it had to display several animanted emojis at once. Use some 20+ emojis and the intel mb pro couldn't handle it. Luckily they knew and there was an option to disable the animation. Now I have no idea if they fixed it since or it is one of those things that was "fixed" by M1.

        Would like to see a write up on how it's even possible to achieve that when PCs from 20-30 years ago had no issue with such task.

        • throwup2385 days ago
          > Would like to see a write up on how it's even possible to achieve that when PCs from 20-30 years ago had no issue with such task.

          Electron.

          • rplnt5 days ago
            But that's just a browser, right? I'm pretty sure my browser in 2005 could display dozens of gifs. There must have been a series of decisions by devs to do it in some super convoluted way. I can see each emoji being an iframe with a full react app doing the animation. But even that should be fine? Maybe?
            • throwup2385 days ago
              It's "just a browser" in the same sense that the Linux kernel and GNU are "just an operating system".

              It's one of the most complex pieces of software - perhaps even human designed systems - ever to exist and we're using it to render a few polygons and drop shadows because the C++ committee made a bunch of mistakes decades ago so now our webdevs are mortally afraid of CMake and Qt/QML. Or GTK. Or whatever. Pretty much the only people that seem to put out native GUI tools in any significant quantity are Apple developers.

              The tradeoffs that Blink and V8 engineers have made to support the entirety of the internet and the morass of web development precludes efficient use of resources for simpler purposes, like rendering an animation. After all, there a billion React hooks and ad tracking scripts to optimize, otherwise bounce rates will increase.

              • robocat5 days ago
                > precludes efficient use of resources for simpler purposes

                Strong disagree. If it were an animated gif then the browser will be astonishingly efficient because of crazy good optimisations.

                The underlying reason is that developers are limited to the techniques/toolbox they know. The performance costs are unpredictable because of:

                (1) the declarative style (using imperative solutions would have other costs),

                (2) debugging browser performance regressions is difficult (SQL EXPLAIN is more learnable).

                Browsers enable developers. I could design that animation in CSS even though I'm a developer, plus I understand the article fully. I couldn't design an animated gif because I am totally unfamiliar with any tools for achieving that.

                I think the Blink and V8 teams do an exceptionally good job when choosing compromises. HTML/CSS/SVG/JS and Chromium are winning the UI wars because they deliver practical and efficient enough solutions. Other UI solutions I have experienced all have other downsides. Browsers are magical.

                I mostly really agree with your comment.

            • schwartzworld4 days ago
              Animated gifs are probably not an issue anymore. In the case of the equalizer icon, the OP used CSS animations that made the browser work in a very non-optimal way. Same thing probably applies to the animated emojis.
            • int_19h4 days ago
              Part of the problem is that because it's "just a browser", devs have to come up with browser-friendly ways to implement some functionality that would be very simple if it were a native app. Sometimes resulting in bugs like https://github.com/microsoft/vscode/issues/22900 ("VS Code uses 13% CPU when focused and idle, draining battery ... due to the blinking cursor rendering").

              Note that this isn't even a case of whoever implemented that cursor "doing it wrong"; to quote another comment on that bug from a Chrome dev:

              > Powerful* text editors built on the web stack cannot rely on the OS text caret and have to provide their own. In this case, VSCode is probably using the most reasonable approach to blinking a cursor: a step timing-function with a CSS keyframe animation. This tells the browser to only change the opacity every 500ms. Meanwhile, Chrome hasn't yet optimised this completely yet, hence http://crbug.com/361587. So currently, Chrome is doing the full rendering lifecycle (style, paint, layers) every 16ms when it should be only doing that work at a 500ms interval. I'm confident that the engineers working on Chrome's style components can sort this out, but it'll take a little bit of work.

          • Cthulhu_4 days ago
            While that's part of it, it's also that there's simply little focus on performance; developers trust the system they build on top of, and animations are a solved problem in terms of performance, right?
            • lomase4 days ago
              Animations on a HTML controled by a css are a solved problem.

              It is a bad idea.

        • ben_w4 days ago
          > Would like to see a write up on how it's even possible to achieve that when PCs from 20-30 years ago had no issue with such task.

          At a guess, by something in the UI framework turning an O(n) task into an O(n^2) task. Seen that happen in person, the iPhone app was taking 20 minutes(!) to start up in some conditions, the developer responsible insisted the code couldn't possibly be improved, the next day I'd found the un-necessary O(n^2) op and reduced that to a few hundred milliseconds.

          The over-abstraction of current programming is, I think, a mistake — I can see why the approach in React and SwiftUI is tempting, why people want to write code that way, but I think it puts too much into magic black-boxes. If you change your thinking just a bit, the older ways of writing UI are not that difficult, and much more performant.

          I have a bunch of other related opinions about e.g. why VIPER is slightly worse than an entire pantheon of god classes combined with a thousand lines inside an if-block: https://benwheatley.github.io/blog/2024/04/07-21.31.19.html

        • bluedino4 days ago
          > I remember Slack eating up my CPU when it had to display several animanted emojis at once.

          I remember using phpBB back in the late 2000's, viewing a page that had at least 100 animated emoticons on it.

          It would slow IE6 down to a halt. But then I tried Chrome or Firefox (I forget which one) and it didn't even blink showing the same page. I even remember reading some developer posts about things like that at the time.

      • grrowl4 days ago
        Or more recently, NPM's infamous "Progress bar noticeably slows down npm install #11283" issue

        [1]: https://github.com/npm/npm/issues/11283

        • guappa4 days ago
          A thing I discovered by myself when I was a teenager: do not update progress bars on every loop.
          • Cthulhu_4 days ago
            There's a lot of things front-end developers (be they web or graphical terminal) can learn from the game development world. I can recommend playing with e.g. Pico 8 sometime, it's artificially constrained so you have to work with the limitations instead of do the usual relatively naive work. It's LUA coding so it shouldn't be a problem for JS developers, assuming they don't mind working in a 20 column code editor (or thereabouts).
            • conductr4 days ago
              I think it’s more so important to get familiar with loop/event based programming. Most of my experience is in the request/response mindset of web requests so when I try out game dev, arduino dev, C# desktop apps, etc I find my mental model has to shift drastically

              Loops occur so fast it’s pretty standard to have to put some throttling logic in the code which I’ve never had to do in web dev except perhaps the document.ready statement in JavaScript to make sure the dom has been loaded.

            • account424 days ago
              Games developers are not magically safe from the progress bar mistake.
          • accrual4 days ago
            Same here. I often implemented like this, where n was some reasonable value:

                if iteration % n === 0:
                    update_progress()
            • guappa3 days ago
              I normally do it with a bitwise and of numbers that end with a sequence of 1s, like 7. so `i & 7 == 7`
      • ericmcer5 days ago
        Why was the solution to optimize the animation instead of... using a static asset?
        • moffkalast5 days ago
          We paid for the whole CSS spec and we're gonna use the whole CSS spec.
          • joquarky5 days ago
            This, but for everything wevdev-related in the past ten years.
        • teaearlgraycold5 days ago
          That would be more optimized for sure. But it wouldn't be scalable and would take more work to create than what looks like a pretty simple change.
        • jonas215 days ago
          Because it provides an indication that the app is receiving your voice input?
          • teaearlgraycold5 days ago
            I think they're suggesting an animated PNG or GIF.
            • Joe_Cool5 days ago
              That's something I'd use a canvas or even a SVG for. SVG FFT Analyzer sound like a fun project.

              Oh, looks like I'm way late: https://cprimozic.net/blog/building-a-signal-analyzer-with-m...

            • kccqzy5 days ago
              Or even better an animated SVG. Put it inside an <img> tag with a known width and height.
              • LegionMammal9784 days ago
                I'd be very surprised if SVG animations don't use substantially the same rendering paths that CSS animations do. In fact, if anything, I'd expect CSS animations to receive more time and attention from browser devs than the old SMIL elements and attributes, to whatever extent that they aren't equivalent internally.
                • kccqzy4 days ago
                  Yeah but the crux of the problem is when height is animated the browser needs to redo the layout. Here the animation is contained entirely within an <img> tag with known width and height.
                • teaearlgraycold4 days ago
                  SVGs are usually animated with CSS.
            • jonas215 days ago
              How would an animated PNG or GIF respond when the user is speaking?
              • teaearlgraycold5 days ago
                Switch between different PNGs depending on voice activity.
                • lightedman4 days ago
                  that's how veadotube works for vtubers.
              • talldatethrow5 days ago
                I assume there's a way to pause a gif.
          • 5 days ago
            undefined
      • emmanueloga_4 days ago
        This is why it pays out to turn on "Paint Flashing" [1] in the web console, even if just from time to time.

        --

        1: https://web.dev/articles/simplify-paint-complexity-and-reduc...

        • moffkalast4 days ago
          Me, who already renders a canvas on the whole page so it highlights the whole thing constantly: "Hmm yes the painting here is made out of painting."
          • emmanueloga_4 days ago
            That's cool, I don't know how popular canvas heavy apps are these days, what kind of apps are you working with?

            I used to work on a set top box app, and for some of the features, replacing the whole page with a single canvas was the only way to get steady FPS.

      • ortsa4 days ago
        This reminds me of having to go into spotify's package files to track down their version of this animation (an animated svg of a bar chart) and kill it, because it would destroy performance on my PC so badly that it was affecting other programs, causing hitches and freezes.

        The animation's still there — and my PC is better now, so it doesn't stutter — but I'm willing to bet it's still burning waaay too many watts, for something so trivial.

      • Dwedit4 days ago
        CSS animations make Flash look efficient by comparison.
      • pinoy4205 days ago
        [dead]
    • barbariangrunge5 days ago
      As a gamedev, those slowdowns are common. Ui rendering, due to transparency, layering and having to redraw things, and especially from triggering allocations, can be a real killer. Comparing old vs new before allowing it to redraw is really helpful. I found layers and transparency was a killer in css as well in one project, but that was more about reducing layers there
      • lomase4 days ago
        Drawing to the screen is a IO operation, is going to be slow.
    • inDigiNeous4 days ago
      Reminds me of the performance optimization somebody discovered in Super Mario World for SNES, where displaying the player score in was very inefficient, taking about 1/6 of the frametime allocated.

      "SMW is incredibly inefficient when it displays the player score in the status bar. In the worst case (playing as Luigi, both players with max score), it can take about a full 1/6 of the entire frame to do so, lowering the threshold for slowdown. Normally, the actual amount of processing time is roughly proportional to the sum of the digits in Mario's score when playing as Mario, and the to the sum of the digits in both players' scores when playing as Luigi. This patch optimizes the way the score is stored and displayed to make it roughly constant, slightly faster than even in the best case without."

      https://www.smwcentral.net/?p=section&a=details&id=35746

    • RankingMember4 days ago
      My favorite example of this is the GTA Online insane loading time issue that ended up being due to poor handling of a 10MB json file (and was finally tracked down by someone outside their org). Took a 6 minute load time down to just under 2 minutes:

      https://nee.lv/2021/02/28/How-I-cut-GTA-Online-loading-times...

    • pjs_5 days ago
      Reminds me of the incident where npm was 2X slower than it should have been because of the fancy terminal progress bar:

      https://news.ycombinator.com/item?id=10974929

    • ognarb5 days ago
      I had a similar case when I work on a Matrix client (NeoChat) and I and the other devs were wondering why loading an account was so slow. Removing the loading spinner made it so much faster, because the animation to render the loading spinner uses 100% cpu.
      • jiggawatts5 days ago
        A common one for server apps is logging, especially to the console.

        It’s far more expensive than people assume and in many cases is single-threaded. This can make logging the scalability bottleneck!

        • rcxdude5 days ago
          Logging to a 'serial' console can stall the whole kernel, wrecking latency, and this can easily show up in VMs since a lot of them emulate a lowest-common-denominator UART inerface as the kernel console.
          • ahoka4 days ago
            I remember setting a server’s console baudrate in EFI to its higher setting improved boot times, hehe.
    • qingcharles5 days ago
      The original Doom must have been heavily profiled by id though, surely? Obviously there is a bunch of things that were missed, but I was in game dev at Doom time and profiling was half the job back then.
      • pjc505 days ago
        Well yes - it was an incredible feat of performance engineering. It's just that since then someone managed to make an extra three thousand commits worth of micro optimizations.
        • qingcharles5 days ago
          True. And Carmack and team only had so many hours in the day.
          • aqueueaqueue4 days ago
            And it us easier in a non moving target.
      • ramses05 days ago
        Until razer comes in and lands a patch authored by an intern that tanks framerates to 1/3rd doing glowy USB shit every frame...

        https://www.reddit.com/r/Doom/comments/8a1m9s/psa_deactivate...

      • fabiensanglard5 days ago
        What tools did you have in 1993? From what I understood, id Software used NeXT because the tooling was not up to the task.
        • qingcharles5 days ago
          Yikes. Now you're pushing my memory. I believe there was a profiler built into Visual Studio and we had some external tool, too. We had some sort of system set up with serial cable to debug and profile. The EXE would download onto our test PC next to us.

          Problem for id is that you have no sane way of profiling the DOS port on NeXT.

          • fabiensanglard5 days ago
            Visual Studio? The first released Visual Studio (Boston) was released in 1997. Doom was developed during 1993 :P !
            • masfuerte5 days ago
              He may well be thinking of Microsoft Visual C++, which dates back to 1993 and evolved into Visual Studio. It had a profiler.

              https://en.wikipedia.org/wiki/Microsoft_Visual_C%2B%2B

              • qingcharles4 days ago
                This. Thank you. I thought it was Visual Studio. It's been a while...! The games were a wretched mixture of C++, C and x86.
                • aqueueaqueue4 days ago
                  I use VC++ in 96. Just trying to learn a bit of C++!
              • Narishma4 days ago
                Doom was developed on NeXT machines, not Windows.
                • fredoralive4 days ago
                  Although the DOS version was compiled using Watcom C, so presumably they'd use Watcom's debugger for testing the DOS version.
      • rasz4 days ago
        There is no evidence of any profiling in source code. It wasnt a thing you did in 1993. The best you could do was compile whole game with new changes, run benchmark loop and compare results.
        • jlokier4 days ago
          The gprof profiler was used long before 1993. It dates back to 1982.

          I did game engine dev professionally in 1994 (and non-professionally before that). We profiled both time and memory use, using GCC and gprof for function-level profiling, recording memory allocation statistics by type and pool (and measuring allocation time), microbenchmarking the time spent sending commands to blitters and coprocessors, visually measuring the time in different phases of the frame rendering loop to 10-100 microsecond granularity by toggling border or palette registers, measuring time spent sorting display lists, time in collision detection, etc.

          You might regard most of those as not what you mean by profiling, but the GCC/gprof stuff certainly counts, as it provides a detailed profile after a run, showing hot functions and call stacks to investigate.

          It's true that most of the time, we just changed the code, ran the game again and just looked at the frame rate or even just how fluid it felt, though :-)

          • bluedino4 days ago
            But was that in Watcom? I don't think gcc was on DOS until DJGPP v1 which was...1992? I don't remember if gprof was ported. iD did use DJGPP for Quake in 1996.

            I'm guessing they used gcc on NeXT but profiling that platform probably didn't make much sense, different compiler, different CPU, different OS...

            The game was written on a TurboColor which had a Motorola 68030 (?) that ran at 33MHz and supposedly only ran at 15FPS (probably as slow as a 386DX/33 at the time)

            • wk_end4 days ago
              Watcom also had a profiler back then.
              • abecedarius4 days ago
                Yes, I remember using it in the early 90s.
          • qingcharles4 days ago
            I was a game engine dev at the same time. We definitely profiled heavily. Microsoft had a profiler of some sort, and I believe we had a third party one too.
        • vardump4 days ago
          I was profiling my code in the early nineties. Sure, I created primitive profiling tools on my own, but regardless.

          While 486 AGI stalls were trivial, especially Pentium changed the game with its U and V pipes. You could no longer trivially eyeball what code is faster. Profiling was a must.

          Heck, I even "profiled" my C64 routines in the eighties by changing the border color. You could clearly visually see how long your code takes to execute.

    • on_the_train4 days ago
      I'm currently at the second job in my life where updating the progress bar takes up a tremendous percentage of the overall performance. Because our "engineers" have never used a profiler. At a large international tech giant :(
      • 4 days ago
        undefined
      • Cthulhu_4 days ago
        I do understand it though, at bigger companies you're less likely to want / need to worry about smaller things. Of course, I'm willing to bet money that they implemented a progress bar themselves instead of using an off-the-shelf one.
        • on_the_train4 days ago
          MFC Progress bar. Like it's 1999
          • int_19h3 days ago
            Win32 already had a stock progress bar control in 1999, though. ~
    • aeyes5 days ago
      He ported the optimization from the Crispy Doom fork. Since this is one of the first changes in the repo I bet that this was a known issue at the time.
    • PlunderBunny5 days ago
      Back at the turn of the century we found that a performance sensitive part of our WIN32 app was adversely affected by reading a setting from an ini file - in Windows 2000, it was significantly slower than on earlier versions of Windows. The setting was just to determine whether to enable logging for that particular part of the app.
    • smat4 days ago
      While this gives an impressive boost in performance, it also means that frametimes are around 10% longer when the status bar has to be updated.

      Overall this can mean that in some situations the game feels not as smooth as before due to these variations.

      Essentially when considering real time rendering the slowest path is the most critical to optimize.

      • flykespice4 days ago
        Yep, in a real situation the player would be constantly moving around collecting hp/ammo/weapon, losing health/ammo to monsters... all these would cause the status bar to be frequently updated.

        I don't think the benchmark accounts that.

    • slavboj5 days ago
      At one point the bottleneck to the Siri iOS client was rendering the animated glowy ball.
  • yjftsjthsd-h5 days ago
    > To get the big picture of performance evolution over time, I downloaded all 52 releases of fastDOOM, PCDOOMv2, and the original DOOM.EXE, wrote a go program to generate a RUN.BAT running -timedemo demo1 on all of them, and mounted it all with mTCP's NETDRIVE.

    I'm probably not the real target audience here, but that looked interesting; I didn't think there were good storage-over-network options that far back. A little searching turns up https://www.brutman.com/mTCP/mTCP_NetDrive.html - that's really cool:)

    > NetDrive is a DOS device driver that allows you to access a remote disk image hosted by another machine as though it was a local device with an assigned drive letter. The remote disk image can be a floppy disk image or a hard drive image.

    • jandrese5 days ago
      > I didn't think there were good storage-over-network options that far back.

      Back in school in the early 90s we had one computer lab where around 25 Mac Plus machines were daisy chained via AppleTalk to a Mac II. All of the Plus machines mounted their filesystem from the Mac II. It was painfully slow, students lost 5-10 minutes at the start of class trying to get the word processor started. Heck, the Xerox Altos also used network mounts for their drives.

      If you have networking the first thing someone wants to do is copy files, and the most ergonomic way is to make it look just like a local filesystem.

      DOS was a bit behind the curve because there was no networking built-in, so you had to do a lot of the legwork yourself.

      • ben77995 days ago
        There was already relatively deep penetration of this stuff in the corporate world and universities way back in the early 1990s.

        Where I want to school we had AFS. You could sit down at any Unix workstation and login and it looked like your personal machine. Your entire desktop & file environment was there and the environment automatically pointed all your paths at the correct binaries for that machine. (While we were there I remember using Sun, IBM, and SGI workstations in this environment.)

        When Windows came on campus it felt like the stone ages as none of this stuff worked and SMB was horrible in comparison.

        These days it feels like distributed file systems are used less and less in lieu of having to upload everything to various web based cloud systems.

        In some ways it feels like everything has become less and less friendly with the loss of desktop apps in favor of everything in the browser.

        I guess I do use OneDrive, but it doesn't seem particularly good, even compared to 1990s options.

        • aaronbaugher5 days ago
          I was using HP Apollo systems with the Aegis OS in the early 90s, and they basically shared one filesystem across their token-ring network. Plug in two or more systems, login to one of them, and each other system's files would be under //othersystem.

          I don't recall whether there were any security-minded limits you could put on what was shared, but for a team that was meant to share everything, it was pretty handy.

        • bluGill5 days ago
          I miss those days often. I have more than once computer, why it it such a big deal to share my filesystem. But many things don't work if you try (firefox for example doesn't like sharing settings that way). The web is great for some things, but local applications are often much more powerful and faster.
        • aeyes5 days ago
          My school had Windows NT and the experience was similar. Any workstation looked the same and had my data.

          Later I saw some Citrix setups which would load the applications from the server. That also worked pretty OK.

          With Windows you definitely had all the options to make this work in the late 90s.

          • int_19h2 days ago
            With Windows, a lot of this is down to software using AppData\(Remote|Local|LocalLow) correctly. Named differently in NT but conceptually it was there all along. The problem back then was that, since 9x didn't have this separation, relatively few apps bothered to do it right.
        • brazzy4 days ago
          > There was already relatively deep penetration of this stuff in the corporate world and universities way back in the early 1990s.

          Anyone remember Novell NetWare?

      • bluGill5 days ago
        Appletalk was horribly slow - 230.4 kbit/s. Ethernet was already 10mbit at the time (but a lot more expensive). General best practice would have been having the world processor installed on each machine and only saving files across the network, which would have made performance acceptable - but at the cost of needing a hard drive in all those plus machines (I don't recall the price a a harddrive at the time, but I'm guessing closer to $1000 for 20mb, compared to multi tb drives for around $100 today)
        • svilen_dobrev5 days ago
          > the worLd processor

          made my day :)

          you did have networking? waw

          not here.. that's why Floppy-net was something.. as well as bus-304-net (like, write a floppy, hop on bus 304, go to other campus)

          • bluGill4 days ago
            In school they were proud to teach us word perfect - the same thing industry uses! by the time I finished nobody used it anymore.
        • KerrAvon2 days ago
          AppleTalk over LocalTalk cables was slow -- AppleTalk could run over Ethernet at Ethernet speeds.
    • somat4 days ago
      There is a neat trick where ipxe can netboot dos from an iscsi target, with no drivers or config dos gets read write access to a network share(well, not a share, if you share it it gets corrupted fast, a network block device). it feels magical but I think ipxe is patching the bios to make disk access go over iscsi.
    • leshokunin5 days ago
      I’m curious: were there NAS’ or WebDAV mount in the DOS era? Obviously there was FTP and telnet and such. Just curious if remote mounts was a thing, or if the low bandwidth made it impossible
      • sedatk5 days ago
        Yes, there was Novell Netware that let you mount remote drives, and there were even file locking APIs in DOS to organize simultaneous access to files. In fact, DOOM's multiplayer code relied on part of Novell Netware stack (IPXODI and LSL). The remote mounts were mainly used on LANs though, not over Internet.
      • bombcar5 days ago
        Yes, it's basically what Netware was, and Novell was a HUGE company.

        SMB (samba) is also from the DOS era. Most people only know of it from Windows, though.

        There were various other ways to make network "drives" as the DOS drive interface was very simplistic and easy to grab onto.

        It was rare to find this stuff until Win95 make network connections "free" (before then, you had to buy the networking hardware and often the software, separately!).

        • roywashere5 days ago
          In the 90s my student union ran a computer network mainly for gaming with DOS PCs and netware running on a Linux server with MARS. This was before they had internet access but it was great for lan gaming: command & conquer, doom, or quake. All games were started from network mounts. Fun times.
      • diet_mtn_dew5 days ago
        A network redirector interface (for 'redirecting' local resource access over a network) was added at least by DOS 3.1 in 1985, possibly earlier in 3.0 (1984)

        [1] https://www.os2museum.com/wp/redirectors-and-dos-3-0/

      • bitwize5 days ago
        WebDAV didn't come out until the back half of the 90s, and it was slow to be adopted at first.

        Back in the day, you could author a web page directly in GruntPage, and publish it straight to your web server provided said server had the FPSE (FrontPage Server Extensions), a proprietary Microsoft add-on, installed. WebDAV was like the open-standards response to that. Eventually in later versions of FrontPage the FPSE was deprecated and support for WebDAV was provided.

      • pjc505 days ago
        WebDAV itself dates from 1999, well into the Windows 95 era. The Novell system pre dates it by a lot.
  • ndegruchy5 days ago
    The linked GitHub thread with Ken Silverman is gold. Watching the FastDOOM author and Ken work through the finer points of arcane 486 register and clock cycle efficiencies is amazing.

    Glad to see someone making sure that Doom still gets performance improvements :D

    • ehaliewicz26 hours ago
      Last year I emailed Ken Silverman about an obscure aspect of the Build Engine while working on a similar 2.5D rendering engine. He answered the question like he worked on it yesterday.
    • kridsdale15 days ago
      I haven’t thought of KenS in ages but back in the 90s I was super active in the Duke3D modding scene. Scripting it was literally my first “coding”.

      So in a way, I owe my whole career and fortune to KenS. Cool.

      • vunderba5 days ago
        I feel like Duke 3D was probably the first mainstream accessible moddable FPS. Doom of course had plenty of level editors, but Duke Nukem brought the ability to alter and script AI as editable plaintext CON files, and of course any skills you learned on the BUILD engine were transferrable to any number of other games (Shadow Warrior, Blood, etc.)

        Also shout out to anyone who remembers "wackplayer" - Duke's equivalent of the BEEP keyword.

        • paulryanrogers4 days ago
          Duke also had a map editor with a 3D editing mode. It allowed raising, lowering floors and ceilings, and picking textures. Ahead of its time. The complexity of brush-based true 3D really put a damper on good, built in editors.
        • hypercube334 days ago
          Command and Conquer with Rules.ini was similar and had many fond memories of mine
          • Zardoz844 days ago
            C&C Red Alert. C&C 1 wasn't too easy to mod. Also, Dune 2000 (derivatives from Red Alert engine), was pretty easy to mod .
      • nurettin5 days ago
        His blog was the first page I "surfed". Talking about duke3d map editor and his big project using voxels.
      • badsectoracula4 days ago
        AFAIK the CON scripting language (used in the *.CON files in DN3D) wasn't made by Ken Silverman but by the Duke Nukem 3D team at 3D Realms. I think it was Todd Replogle who wrote the CON stuff.
    • phire5 days ago
      There are some real gems in there.

      I especially liked the idea of CR2 and CR3 as scratchpad registers when memory access is really slow (386SX and cacheless 386DXs). And the trick of using ESP as a loop counter without disabling interrupts (by making sure it always points to a valid stack location) is just genius.

      • ndegruchy4 days ago
        Yes! I know nothing about low level programming, but the idea of using a register that you don't need for a fast 'memory' location is particularly clever.
  • unleaded5 days ago
    One feature of FastDOOM I haven't seen mentioned here are all the weird video modes, some interesting examples:

    - IBM MDA text mode: https://www.youtube.com/watch?v=Op2tr2lGK6Y

    - EGA & Plantronics ColorPlus: https://www.youtube.com/watch?v=gxx6lJvrITk

    - Classic blue & pink CGA: https://youtu.be/rD0UteHi2qM

    - CGA, 320x200x16 with 'ANSI from Hell' hack: https://www.youtube.com/watch?v=ut0V1nGcTf8

    - Hercules: https://www.youtube.com/watch?v=EEumutuyBBo

    Most of these run worse than with VGA, presumably because of all the color remapping etc

    • toast05 days ago
      > - EGA & Plantronics ColorPlus: https://www.youtube.com/watch?v=gxx6lJvrITk

      Any love for Tandy Graphics Adapter? I'd hate to have to run in CGA :( would need a 286 build for my Tandy 1000 TL/2, if it was still alive.

    • Cthulhu_4 days ago
      That's awesome, just a great demonstration why these aspects of the game should be separated. It reminds me of the "modern" Clean Architecture for back-end applications.
    • tecleandor5 days ago
      The IBM MDA text mode is terrible... Love it!
  • jakedata5 days ago
    "IBM PS/1 486-DX2 66Mhz, "Mini-Tower", model 2168. It was the computer I always wanted as a teenager but could never afford"

    Wow - by 1992 I was on my fourth homebuilt PC. The KCS computer shows in Marlborough MA were an amazing resource for tinkerers. Buy parts, build PC and use for a while, sell PC, buy more parts - repeat.

    By the end of 1992 I was running a 486-DX3 100 with a ULSI 487 math coprocessor.

    For a short period of time I arguably had the fastest PC - and maybe computer on campus. It outran several models of Pentium and didn't make math mistakes.

    I justified the last build because I was simulating a gas/diesel thermal-electric co-generation plant in a 21 page Excel spreadsheet for my honors thesis. The recalculation times were killing me.

    Degree was in environmental science. Career is all computers.

    • wk_end4 days ago
      "Wow"? Is it really necessary to give this guy a hard time for being unable to afford the kind of computers you had in 1992?

      Anyway, there's no such thing as a "DX3". And the first 100MHz 486 (the DX4) came out in March of 1994, so I don't see how you were running one at the end of 1992.

      My family's first computer - not counting a hand-me-down XT that was impossibly out-of-date when we got it in 1992 or so - was a 66MHz 486-DX2, purchased in early 1995.

      I can't quite explain why, but as a matter of pride it's still upsetting - decades later - to see someone weirdly bragging about an impossible computer that supposedly outran mine despite a three year handicap.

      • thereticent4 days ago
        Is that really what "wow" means here? I took it more as "wow, I've been around forever / I must be old now" or something similarly tame.
    • bpoyner4 days ago
      That definitely brought back memories. Around '92, being a poor college student I took out a loan from my credit union for about $2,000 to buy a 486 DX2-50. For you younger people, that's about $4,000+ in today's money for a pretty basic computer. I dual booted DOS and Linux on that bad boy.
    • antod5 days ago
      A 486DX and a 487? I thought the 487 was only useful for the SX chips?

      ...looked it up, apparently the standard 487 was a full 486DX that disabled and replaced the original 486SX. Was this some sort of other unusually awesome coprocessor I hadn't heard of?

      • 486sx335 days ago
        Doubled throughput of certain calculations in certain tasks if motherboard supported it

        Possibly something software like maple could take advantage of

      • cantrecallmypwd4 days ago
        Doesn't make any sense, perhaps it's AI-generated nonsense. There was a DX4 100 but no such thing as a "DX3". The 486 included an FPU so there'd be no reason to have a "487" which was a complete replacement for the 486SX chips. There were Pentium Overdrives but those were CPU replacements on the 486DX.
      • jakedata5 days ago
        The 486sx had a 16 bit external bus interface so it could work with 386 chipsets. The DX processors had a full 32 bit bus and correspondingly better throughput. The 486 never included an integrated FPU, you had to add a separate co-processor for that. I could go on about clock multipliers and base frequencies but I'll spare you.
        • bluedino4 days ago
          I think you're thinking of the 486SLC

          The 486SX was fully 32-bit (unlike the SLC and 386SX) and the 486DX had the integrated FPU, and the 487 was a drop in 486DX which disabled the 486SX

        • hollandheese4 days ago
          You are thinking of the 386. You perfectly describe the situation on the 386 not the 486.

          The 386SX had a 16 bit external bus interface so it could work with 286 chipsets. The DX processors had a full 32 bit bus and correspondingly better throughput. The 386 never included an integrated FPU, you had to add a separate co-processor for that.

          • jakedata4 days ago
            Ya, I slept on it and realized I skipped a generation in my mind. I guess the details of one of the PCs I built 35 years ago fade after a while.
        • Narishma4 days ago
          You're wrong. The 486SX had a 32-bit bus, just like the DX version. The difference between them is that the DX had an integrated FPU while the SX had it disabled and you had to add a separate 487 co-processor.
        • cantrecallmypwd4 days ago
          The 486DX had an FPU. It was the 486SX that lacked it. The "FPU" upgrade for a 486SX was a entire special version of the 486DX that disabled the original 486SX entirely.
    • ForOldHack5 days ago
      "It outran several models of Pentium and didn't make math mistakes." Total bragging rights. Total. You owned them. Good job.
  • mmphosis5 days ago
    On top of releasing often, Viti95 displayed outstanding git discipline where one commit does one thing and each release was tagged.

    https://fabiensanglard.net/fastdoom/#:~:text=one%20commit%20...

  • kingds5 days ago
    > I was resigned to playing under Ibuprofen until I heard of fastDOOM

    i don't get the ibuprofen reference ?

    • kencausey5 days ago
      Guess: headache from low frame rate?
      • fabiensanglard5 days ago
        Indeed.
        • apetresc5 days ago
          I legitimately thought it was some DOS compatibility layer or something. Like, you’d have to run Doom that way because of the low framerate natively.
          • Cthulhu_4 days ago
            With every interesting word in the dictionary having been used to name products by now, the confusion is understandable.

            edit: although ibuprofen is a brand name.

            • fredoralive4 days ago
              Ibuprofen is the generic name for the drug. There are branded variants like Nurofen if you want to spend more money for no real reason.
    • 5 days ago
      undefined
  • sedatk5 days ago
    If the author reads this: John Carmack's last name was mistyped as "Carnmack" throughout the document.
    • fabiensanglard5 days ago
      Thank you for taking the time to report it. It has now been fixed.
      • mkl5 days ago
        Another typo s/game/gave/: "Another reason John game me".
      • CamperBob25 days ago
        Speaking of Carmack, can you (or someone) elaborate on this quote?

        >DOOM cycles between three display pages. If only two were used, it would have to sync to the VBL to avoid possible display flicker.

        How does triple buffering eliminate VBL waits, exactly? There was no VBL interrupt on a standard VGA, was there?

        • fabiensanglard5 days ago
          Triple buffering mean you can render at any speed, there will always be a valid target where to render and you never need to wait or vsync.

          This is not the case with double buffering. There can be a case, if the CPU renders fast enough, where it just finished rendering to the current target, but the previous target is still being sent to the CRT. In that case the CPU need to block on VBL.

          • CamperBob25 days ago
            Sure, that part is well understood. But triple buffering doesn't avoid the need for vsync. If the VBL interrupt isn't reliably available on all the VGA cards out there, you are still going to have flip pages from the foreground at some point, and you would rather not do that outside the blanking interval.

            I suppose you could poll the CRTC every so often during the game loop or rendering process, though. That must have been how it worked.

            • rep_lodsb4 days ago
              The CRTC latches the display start address at the beginning (IIRC) of vblank, so you can just write a new value to that register at any time without affecting the current frame.
        • pjc505 days ago
          • CamperBob25 days ago
            Not something reliable enough to use in a shipping product, apparently.
            • o11c5 days ago
              I imagine that a GPU-only "select this buffer when the next frame happens" would be better supported, even if the CPU doesn't get told when. Or even if you can't get an interrupt, you could poll periodically, etc.
              • rep_lodsb4 days ago
                That's already how it works on the VGA, you write the start address into a CRTC register and it will be used once the next frame starts, without affecting the current one.
  • z3t44 days ago
    It's not trivial to go back in versions to check for improvements or regressions, because some optimizations can introduce bugs that is later discovered, or you introduce a vital feature that degrades performance... So you can make your life easier by having automatic performance tests that are run before each release, and if you discovered a performance issue you write a regression test as usual... What I'm trying to say is: Do performance testing!
  • rasz4 days ago
    >Optimize R_DrawColumn for Mode Y

    Seeing this made a difference makes it clear Fabien ran fastdoom in Mode Y

    >One optimization that did not work on my machine was to use video mode 13h instead of mode Y.

    13h should work on anything, its the VBD that requires specific VESA 2.0 feature enabled (LFB * ). VBR should also work no problem on this IBM

    Both 13h and VBR modes would probably deliver another ~10 fps on 486/66 with VESA CL5428.

    * LFB = linear frame buffer, not available on most ISA cards. Somewhat problematic as it required less than 16MB ram or "15-16MB memory hole" enabled in bios. On ISA Cirrus Logic support depended on how the chip was wired to the bus, some vendors supported it while others used lazy copy and paste of reference design and didnt. With VESA Cirrus Logic lazy vendors continued to use same basic reference design wiring disabling LFB. No idea about https://theretroweb.com/motherboards/s/ibm-ps-1-type-2133a,-... motherbaord

    • Trixter3 days ago
      >13h should work on anything

      You misinterpreted what he wrote. He wasn't saying that mode 13h didn't work; he meant that the optimizations in the mode 13h path of the executable weren't as good as the Mode Y path. It's the optimization that didn't work, not mode 13h itself.

    • fabiensanglard4 days ago
      > VBR should also work no problem on this IBM

      I think this statement is incorrect. These modes requires support for VESA 2.0 which this IBM does not have.

      > Somewhat problematic as it required less than 16MB ram or "15-16MB memory hole" enabled in bios.

      Could be the issue. Do you have any documentation about this?

      • rasz4 days ago
        Load Univbe.exe. "15-16 hole" is only required for ISA cards to work in VBD mode.
  • manoweb5 days ago
    Unlike the author, back in the day I would have preferred a 486DX50 to the DX2-66. 50MHz bus interface (including to the graphics card) instead of 33MHz
    • antod5 days ago
      My first job was AutoCAD drafting on a DX50 with 16MB. Quite high specced in the early 90s. Not sure I would've noticed the difference compared with a DX2 though.
    • fabiensanglard4 days ago
      Can you elaborate why? I understand the bus speed difference but the benchmark I remember always gave the 66Mhz winning over the 50Mhz.
  • bee_rider5 days ago
    From a quote in the article

    > One of my goals for FastDoom is to switch the compiler from OpenWatcom v2 to DJGPP (GCC), which has been shown to produce faster code with the same source. Alternatively, it would be great if someone could improve OpenWatcom v2 to close the performance gap. > - Conversation with Viti95

    Out of curiosity, how hard is it to port from OpenWatcom to GCC?

    Clearly the solution here is to write a Watcom llvm front end…

    • fabiensanglard4 days ago
      > how hard is it to port from OpenWatcom to GCC?

      I don't think it is that hard but likely very time consuming.

      In theory it should only be about writing a new build script (not based on `wmake` but on a real `make`). And then workout the tiny flag/preprocessor/C compiler discrepancies.

      • turol4 days ago
        > In theory it should only be about writing a new build script (not based on `wmake` but on a real `make`). And then workout the tiny flag/preprocessor/C compiler discrepancies.

        For mostly C code like the original Doom source, yes. But it looks like FastDoom people have added quite a bit of assembly. That needs to be ported to AT&T syntax. Or you need to find out if Intel syntax works in your version of DJGPP's gas. While this should work nowadays I have not tried it. Then there's other differences like Watcom mapping low memory by default into the low part of the address space but DJGPP needing explicit mapping or access.

  • anilgulecha4 days ago
    One non-cynical take on why modern software is slow, and not containing optimizations such as these: The standardization/optimization hypothesis.

    If something is/has become a standard, then optimization takes over. You want to be fastest and meet all of the standard's tests. Doom is similarly now a standard game to port to any new CPU, toaster, whatever. Similarly email protocol, or a browser standard (WebRTC, Quic, etc).

    The reason your latest web app/ electron app is not fast is that it is exploratory. It's updated everyday to meet new user needs, and fast-enough-to-not-get-in-the-way is all that's needed performance wise. Hence we see very fast IRC apps, but slack and teams will always be slow.

    • 4 days ago
      undefined
  • rob744 days ago
    Ah, that picture brings back memories - I used to have a successor of that machine in the mid nineties (PS/1000), it looked almost the same, except the handle was rounded and the power button was blue (a big blue button). And the CPU was IBM's very own "Blue Lightning" 486SX clone (75 MHz, but no FPU). It ran Doom great, but had to pass on Quake, which required an FPU for its polygon-based 3D graphics.
  • ge965 days ago
    > I always wanted as a teenager but could never afford

    Funny how that is, for me it was a Sony Alpha camera (~~flagship at the time~~) and 10 years later I finally buy it for $50.

    • fabiensanglard5 days ago
      • ge965 days ago
        Yeah I was not exposed to camera gear at that time but seeing the NEX series with the tiny body and massive lens, I wanted it.

        I know there are better cameras in the Alpha line but yeah, I had an R3 at one point which was wasted on me as an amateur.

        • Arainach5 days ago
          You don't need to use every capability of a device to make it beneficial for you.

          Let's stick with photography. Can someone who knows what they're doing get great results with cheap equipment? Yes, in many situations. Is it WAY easier with the right gear? Absolutely. Finding the right balance is tricky - no, you don't need a flagship body and lens to get started, but having a flagship or pro-grade body/lens from 1-3 generations ago can be huge.

          I shoot Nikon, not Sony, but going from a consumer D50 body to a Pro D300 body was huge just in terms of ergonomics - more buttons to allow me to quickly adjust things without having to pull the camera from my eye and fumble through menus.

          In the current generation, I finally moved to Mirrorless with a Z6ii which blew my mind and enabled so many more things - no, I wasn't "getting the full use" out of it and yes, I got some great shots with my old DSLR gear, but it made so many things so much easier that it made shooting fun and got me to carry the camera and take photos every day, which has been the biggest factor in improving my skills. Within the last few months I splurged and upgraded to a current-gen flagship (Z8) which amazed me once again - the Z6ii was more camera than I could fully exploit, but the Z8's ergonomics are just incredible - so many buttons, most of them remappable, allowing me to truly develop an instinctive way of shooting and allowing the equipment to get out of my way.

          It's important to try to avoid loving gear more than loving the activity, but that doesn't mean that higher-end gear is "wasted on" amateurs.

          • ge964 days ago
            Yeah I just could not afford this stuff (debt) and I bought it used, $2K for a body, $2K for a G lens ... Then you get the urge to start buying all the primes...

            I'm back to the basics now trying to produce videos with an Nex-5n it's not 4K but very cheap.

            I get what you're saying though

  • hyperman14 days ago
    I see the acronyms MVP and MPV in the post. Does someone know what they mean?
    • timendum4 days ago
      MVP is Most Valuable Player, ie best commits in terms of improvements on fps

      I think MPV it's a typo for MVP

      • Out_of_Characte4 days ago
        In the context of the article, We could assume it means "Most valuable Pull request"

        Just dont let corporations use MVPR as a metric or it will cease to be a fun challenge

    • ant6n4 days ago
      Tried to find with google and ChatGPT, no luck. Maximum performance value, perhaps?
    • riddley4 days ago
      Pretty sure it's Minimum Viable Product. This is a common term for a product with enough features to prove the value of the endeavor.
  • fitsumbelay5 days ago
    very nice website design
    • aquova5 days ago
      I wanted to comment on that as well. It dawned on me near the end of the article that all lines end exactly at a word boundary. You can tell there's some subtle kerning going on, as different lines don't line up, even though it's a monospaced font, but it's very subtle and well done.
      • kristianp5 days ago
        That's called text justification. Spaces are added between words to line up the words to end at the right margin.
      • syncsynchalt4 days ago
        It's as easy as CSS:

            text-align: justify;
        
        try it in your browser console:

            document.body.style="text-align: justify";
  • cantrecallmypwd4 days ago
    In high school, the fastest computer in the computer lab was an IBM-donated PS/1 486SX 25 all-in-one that also was used to play DOOM.
  • dabeeeenster5 days ago
    > I was resigned to playing under Ibuprofen until I heard of fastDOOM

    WTH is Ibuprofen?!

    • samplatt5 days ago
      To add to the list of drugs, the biggest brand for ibuprofen here in Australia is Nurofen, with Advil being the biggest "cheap brand" version of it.

      ibuprofen is an anti-inflammatory and anti-coagulant, sold under many different names.

    • cbzbc5 days ago
      It's an NSAID made under a variety of names. You might be familiar with Advil?
    • BizarroLand4 days ago
      Aside from all of these people giving you the technically correct answer, I also for a moment wondered if "ibuprofen" was the gimmick name of some wrapper that sped up doom or made it more playable on the system before I realized the author meant that playing with the slow frame rates would be painful.
    • ahartmetz5 days ago
      US: Advil?

      A fairly close relative to Aspirin that's easier on the stomach and has less of an anticoagulant effect.

    • fabiensanglard5 days ago
      The US equivalent of Aspirin.
      • IMTDb5 days ago
        Ibuprofen and aspirin share some similarities but are different molecules. Both ibuprofen and aspirin are available in the US, both are also available in Europe.

        It's not the same as - for example - Tylenol which is called Doliprane or Dafalgan in Europe. In that case the active molecule is the exact same, just the name is changing; but you will have a hard time finding a box with Tylenol written on it in France.

        • account424 days ago
          > Tylenol which is called Doliprane or Dafalgan in Europe

          I have never seen it called Doliprane or Dafalgan here only ever Paracetamol which is the generic name.

  • prox5 days ago
    Is there a recommended place where I can play Doom in the browser?

    If such a thing exists!

  • zombot4 days ago
    How does git fit into 4 MiB of RAM?
    • ChrisRR4 days ago
      I think the more appropriate question is how do modern apps not fit into 4MB of RAM?

      How is it that simple tools like text editors work the same as they did 20 ago but take orders of magnitude more RAM?

      • miohtama4 days ago
        The higher resolution of rendering explains some of it. Advanced completion, syntax highlighting and other productivity features take their share. Some goes to faster to develop programming languages, used to develop these editors, with run-time penalties, coming with this productivity boost.

        One can still use Notepad if you want to, but be much less productive with it.

      • 4 days ago
        undefined
  • klaussilveira5 days ago
    Glad to see another post on Fabien's blog!
  • acoolguy485 days ago
    This is cool
  • hinkley5 days ago
    So what does one do with a faster Doom, besides bragging, larger maps and more simultaneous players?
    • fabiensanglard5 days ago
      You can play on vintage hardware with a decent refresh rate. It makes it more enjoyable.
      • pixelpoet5 days ago
        Thanks for your awesome articles, esp the path tracing postcard[0] :) I've had the pleasure of hanging out with Andrew Kensler at a conference dinner (EGSR 2019?), amazing guy! He scribbled a bunch of great quasi Monte Carlo notes into my notebook and even signed it on request :")

        [0] https://fabiensanglard.net/revisiting_the_pathtracer/index.h...

        • fabiensanglard4 days ago
          > He scribbled a bunch of great quasi Monte Carlo notes into my notebook and even signed it on request :"

          Show me a photo of that beauty.

          +1 on Andrew Kensler's awesomeness. After I published these article, he took the time to send me a package with Pixar goodies. Deeply moving gesture.

      • ForOldHack5 days ago
        I was thinking modern hardware with multiple windows, and RT. I upgraded from a 970m to a 1080m just to see... beauty... wish I had my old PC around.
    • pixelpoet5 days ago
      Does it always have to be for practical benefit? What about pure learning and intellectual enjoyment? Where are the true limits? In the end absolutely nothing we do matters :)

      These dudes are living their best lives, and having done Quake-style asm texture mapping loops in the 90s (Mike Abrash, fastmap, Chris Hecker, PC Game Programming Encyclopedia, ...), I can definitely appreciate it <3

    • FartyMcFarter5 days ago
      People enjoy learning about old software. Fabien Slangard who wrote this article built a whole website and wrote several books based on that.
      • ForOldHack5 days ago
        Two books on Doom, and a book on Wolfenstien. Three books? That rocks!
    • pak9rabid5 days ago
      Impress chicks
    • Narishma5 days ago
      Run on slow hardware or save power if you're on a battery.
    • account424 days ago
      > more simultaneous players?

      You don't get to do that according to the article.

    • kridsdale15 days ago
      Run it on more obscure hardware?
      • dvhh5 days ago
        Considering that the code is mostly x86 assembly, the gains from such optimization are quite unlikely.
    • ChrisRR3 days ago
      Geek cred
  • alanh5 days ago
    For readability:

      html { font-family: system-ui; }
    
    Consider https://alanhogan.com/bookmarklets#add_css to add this to the page. Code blocks are still shown in monospaced font. BTW, monospaced font for prose is an anti-pattern that you hackers need to relinquish, but whatever!
    • kccqzy5 days ago
      It's clear that here using monospaced font is an artistic decision here.

      Furthermore, using system-ui for anything that's prose and not a UI is an anti-pattern that UI designers love to make. It also makes the font dependent on system language, which makes things worse if the system language doesn't match the language of the page. https://infinnie.github.io/images/blog/bootstrap.png Even hardcoding it to something classic like Verdana (remember web safe fonts?) is much better.

    • 6 hours ago
      undefined
    • robertlagrant4 days ago
      > monospaced font for prose is an anti-pattern

      Why?

      • ChrisRR3 days ago
        Because they say so