93 pointsby ingve6 days ago15 comments
  • virtualritz6 days ago
    Curiously, what people commonly refer to as 'Waterfront OBJ' is merely a tiny subset of that format. I.e. the part dealing with polygons.

    The format supports e.g. higher order curves and surfaces and apps like Maya or Rhino3D can read and write OBj files containing such data. [1]

    Writing a parser for the polygon subset also comes with some caveats.

    If your target is a GPU you probably need to care for robust triangulation of n-gons and making per-face-per-vertex data per-vertex on disconnected triangles.

    Vice versa, if you are feeding data to an offline renderer you want to absolutely preserves such information.

    I believe the tobj Rust crate is one of the few OBJ importers that handles all edge cases. [2] If you think it doesn't, let me know and I will fix that.

    This is surprising for people familiar with one but not the other of the requirements of offline- or GPU rendering.

    I.e. if you write an OBJ reader this can become a challenge; see e.g. an issue I opened here [3].

    1. https://paulbourke.net/dataformats/obj/

    2. https://docs.rs/tobj/latest/tobj/struct.LoadOptions.html

    3. https://github.com/assimp/assimp/issues/3677

    • grandempire6 days ago
      > robust triangulation of n-gons and making per-face-per-vertex data per-vertex on disconnected triangles.

      This is a simple post-process step after parsing.

      • sreekotay5 days ago
        Depends on your definition of robust (e.g. how are you handling T-junctions) but in general I would agree with this, in that that some things that are NOT trivial should rightly be pushed up the art pipeline.

        Good pipelines shouldn't rely on clever parser behaviour (as a general rule)

        • grandempire5 days ago
          Good point about T-junctions. I was thinking only of combinatorial triangulation.
      • virtualritz2 days ago
        When your polygons are convex and non-planar 'simple' quickly becomes an euphemism.

        You may even want to consider adjacent topology for something as 'simple' as which of two possible solutions to take, when splitting a non-planar quad into two triangles.

        Triangulation is simple only when your polygons are planar and convex and aspect ratios of resulting triangles do not matter under any 2ndary circumstances. Some uses e.g. require avoiding thin, splinter-like triangles.

    • the__alchemist6 days ago
      How does this compare to the `obj` crate? I'm assuming that doesn't handle cases beyond the common one well? I ask because I have a 3D rendering/GUI application lib in rust (`graphics` crate), and for OBJ files, I thinly wrap that to turn it into a mesh.

      In my own applications, it hasn't come up, as I've been mostly using primitives and dynamically-generated meshes, but am wondering if I should switch.

  • suspended_state6 days ago
    That's very nice work, and many interesting concepts introduced in the post (for example, arenas, length bounded strings, the Cut struct).

    One caveat though:

    > If the OBJ source cannot fit in memory, then the model won’t fit in memory.

    I don't think that this is true: a (single precision) float textual representation is typically equal or larger than its binary representation (4 bytes), the floating point used in the renderer given later in the post. The numbers given in the cube example are unlikely to occur in real world examples, where one would probably expect more than 2 digits of decimal precision. That being said, for double precision floats, it might be true in many scenarios, but I would not make that a cardinal rule anyway.

    This corner cut fits within the objective of the post, which, imho, isn't to make the most efficient program, but provide a great foundation in C to build upon.

    • grandempire6 days ago
      The technique shown can be easily adapted to mmap.
    • dahart6 days ago
      The sentence you quoted must be true because the input file and the output binary model both need to fit in memory at the same time.
      • suspended_state6 days ago
        I guess that the following statement would be true: if the process cannot load the whole file in memory and allocate memory for the model at the same time, then the it won't be able to run successfully. Strictly speaking, the sentence I quoted doesn't derive from that. This is just me quibbling though, because the intended meaning was most likely what you said.
  • pixelesque6 days ago
    As someone who has written multiple OBJ readers over the years, this is interesting, but noteably seems to be ignoring texture coords (UV coords), and doesn't support object groups.

    Also obj material support is an absolute nightmare if you ever try and support that: there's technically a sort of original standard (made around 30 years ago, so understandably somewhat messy given how far materials and physically-based shading has come in the mean time), but different DCCs do vastly different things, especially for things like texture paths and things like specular/roughness...

    • milesrout6 days ago
      I think it doesn't support 'vt' because the techniques are adequately demonstrated just with faces and normals, so it would be more code without serving any pedagogical purpose. The author would, I think, not suggest you copy this code and try to use it as a library or something, but that you should develop the skillset to be able to write code like this when you need it.
    • irq-16 days ago
      Isn't there a common convention with vertex colors, where the color is just listed after the vertex?

      (/a quick query) https://paulbourke.net/dataformats/obj/colour.html

  • tylermw6 days ago
    Note that there's a great C99/C++ single header library, tinyobjloader, that provides robust (in my experience) and feature-full OBJ loading, including triangulation of n-gons and full material parsing.

    https://github.com/tinyobjloader/tinyobjloader

    It's fairly mature and handles many of the parsing footguns you'll inevitably run into trying to write your own OBJ parser.

  • pjmlp6 days ago
    The usual rite of passage into 3D programming in the old days, adding all the things that OpenGL doesn't do out of the box like in other 3D frameworks, naturally the 3D assets loading code was OBJ based.

    Nowadays you can have the same fun by rewriting the previous sentence using Vulkan instead of OpenGL, and glTF instead of OBJ.

    • rossant6 days ago
      "The same fun" but also likely orders of magnitude more efforts (and headaches).
      • pjmlp6 days ago
        Indeed, at least now there is an SDK as starting point.
  • smcameron6 days ago
    Heh. I feel that I might be somewhat responsible for the existence of this article. Perhaps merely a coincidence, but this happened a few days ago: https://old.reddit.com/r/C_Programming/comments/1itrhd9/blat... in which the author schools me on the topic of American Fuzzy Lop which he applied to my home made OBJ parser.
  • gdubs5 days ago
    I'm kind of obsessed with old formats and how they are like a fossil record of a period of computing that's gone, and that I have nostalgia for. Wavefront was such an alluring application when it was around. The peak of Silicon Graphics era mystique. It was one of the big 3 from the time that included Alias, and Softimage. Then, over time there were various mergers and acquisitions and all the apps started to kinda meld into a commonality – we lost a bit of the unique magic of all these different approaches and opinions.

    You can see something similar in the MIDI specification. There's actual manufacturer codes and if you read through them you get a kind of 'who's who' of the synth era.

  • turnsout6 days ago
    This sent me down a rabbit hole reading about the author's style of having an "Arena allocator," [0] which was fascinating. I often did something similar when writing ANSI C back in the day—allocate a big enough chunk of memory to operate, and do your own bookkeeping. But his Arena implementation looks more flexible and robust.

      [0]: https://nullprogram.com/blog/2023/09/27/
  • klaussilveira6 days ago
    If anyone is looking for a very fast OBJ loader, this is great: https://github.com/guybrush77/rapidobj

    Also: https://aras-p.info/blog/2022/05/14/comparing-obj-parse-libr...

  • bvrmn6 days ago
    > Str substring(Str s, ptrdiff_t i)

    The function has quite questionable implementation. It fails miserably for strings with length < i.

    • Joker_vD6 days ago
      Only because every other Str-accepting function uses "s.len" instead of "s.len > 0" as the "is s non-empty" test.

      Still, this function is called only once, and in that call, its i argument is always <= length, so it's perfectly fine (it's only UB if you actually pass it a bad argument).

      • bvrmn6 days ago
        > Still, this function is called only once, and in that call, its i argument is always <= length, so it's perfectly fine (it's only UB if you actually pass it a bad argument).

        This very mindset is a source of bugs and vulnerabilities. The author has high marks from me on safety and "make it hard to use wrong" and it's quite surprising to see such code.

        • grandempire6 days ago
          Satisfying preconditions is a requirement to make functioning programs.

          The insanity would be assuming that every function is valid for the Cartesian product of all possible of its arguments.

          What he probably needs is an assert

          • Joker_vD6 days ago
            > Satisfying preconditions is a requirement to make functioning programs.

            > The insanity would be assuming that every function is valid for the Cartesian product of all possible of its arguments.

            Would it? That reminds me of a recent post on HN about proving the long (binary) division algorithm with Hoare's logic. It uses the "d > 0" precondition and proves that, indeed, the algorithm arrives at the required postcondition. However, the algorithm still terminates and produces something even when d == 0. What does it computes in this case? Is it useful? Should such questions even be considered?

            • grandempire6 days ago
              > Should such questions even be considered?

              Yes, a better understanding of the problem gives you a better understanding of the preconditions. Always ask if you have that right and weaken accordingly.

          • bvrmn6 days ago
            For this particular case it's trivial to fix substring function and extend possible inputs. It seems your proposition: "do nothing because it's futile". It's simply wrong.
            • grandempire6 days ago
              Will that make the function more useful?

              In general you can write better code when you can make assumptions.

              Code to handle every possibility is filled with error prone branching, that reduplicates effort at every function.

              • bvrmn6 days ago
                It would reduce number of assumptions, especially ones laying only in your head. Generally it's a good thing, isn't it? Literally large portion of C code bugs is due to broken assumptions. WTF man?
        • milesrout5 days ago
          This mindset is literally the way safe programs work. What do you think functions are?

          This isn't some 100k line long program where this function is used all over the place and code churns constantly so checking invariants in the function definition makes sense.

          It is called in one (1) place in a small program.

          • bvrmn5 days ago
            > This isn't some 100k line long program

            It could use `str*` functions without any issues then. Nul-terminated strings are perfectly safe with assumptions to follow.

            Anecdote: I fixed 3 reported segfaults and another 2 after fuzz-testing in a small 500 line lib. Original author had the same cowboy mindset about keeping all stuff in his head. It's always last words before getting into CVE database.

            • grandempire4 days ago
              > Original author had the same cowboy mindset about keeping all stuff in his head.

              Nobody is saying that. Name, document, and assert.

              • bvrmn4 days ago
                > It is called in one (1) place in a small program.
                • milesrout3 days ago
                  How is that in his head? That is the code.
                  • bvrmn3 days ago
                    > That is the code.

                    Good code consists of easy to use abstractions. In general Chris's blog is dedicated to poking bad abstractions and giving good examples. `substring` is objectively bad.

                    You (and other commentators) literally arguing that keeping staff in the head or making some documentation or notes or forcing yourself or others check low level implementation details are better than making one trivial fix and forget about it. I find it really amusing.

                    • grandempire2 days ago
                      > Good code consists of easy to use abstractions

                      Good code consists of solving a problem simply and efficiently. You can have a great program which is concrete. Abstraction is a tool for managing complexity, not a goal.

                      I would really encourage writing a large program C or assembly, as a practice of how to deal with programming problems without immediately wrapping things in an interface. A lot of things that feel like producitivity end up being wastes of time.

                    • milesrout3 days ago
                      It is simple, short and obviously correct. You don't have to record every single line of reasoning in a proof or argument, when the intermediate steps are obvious. Same applies here. It isn't necessary to document this because it is obvious from a glance that it is correct. You can derive its correctness with a moment's thought. So it is fine not to assert it - that would be clutter.
                      • Joker_vD2 days ago
                        > obviously correct

                        While it is "obvious" that it is correct, it is, in fact, neither obvious nor correct if A. you ever subtract from then len field, and B. you use "len != 0" as a check for "is string non-empty". This program does both things quite a lot and it requires a conscious check to re-assure yourself that no, in this particular program, the patterns of acitons A (subtractions from the len field) never break the property B (non-zero len means that the string is not empty). But any minor modification may accidentally break this harmony if you are not aware that it is required to hold.

        • UncleEntity6 days ago
          Reminds me of the time I was chastised for adding a NULL check to keep <program> from segfaulting by the dev responsible for said segfault because crashing without even as much as a warning was "intended behavior". IIRC this was over reading a file from disk and just assuming it existed.
    • milesrout5 days ago
      Why would you do that? Is there any situation in which it is called in this program where that could be true?
  • writebetterc6 days ago
    This way of writing programs is also quite a lot faster than depending on fgetline and the like. The integer and float parsing is probably slow, though.

    My question is: Does the author actually use Windows XP?

    • claytonaalves6 days ago
      > Does the author actually use Windows XP?

      I've switched to XP (from Windows 7, on a VM) and the performance is astounding even on limited hardware settings. No bloatware, just good old Win32 x86.

      • MSFT_Edging6 days ago
        I recently pulled an old laptop out of the closet with a mostly stock image of XP to play with an old device and it felt so snappy.

        It's sad how bloated things have gotten.

      • 1f60c6 days ago
        Is it connected to the network?
        • oguz-ismail6 days ago
          Mine is. Why?
          • shortrounddev26 days ago
            Because XP is no longer receiving security updates
            • oguz-ismail6 days ago
              And?
              • t0rakka5 days ago
                What do you mean and? The implications are implicit; any vulnerability will be unpatched, so bad actor (tm) has to know only ONE vulnerability after XP support was ceased. If he has means of talking to the machine through TCP/IP or UDP he will have 100% guaranteed access.

                You wouldn't believe how much traffic is hammering IP ranges with known vulnerabilities. Forward port 22 to your Linux box or similar, check the logs for number of "connection attempts", it's going to be glorious log. A-HOLES of this planet are doing this just to get control of devices connected to the internet, if for no other reason than use them in DDoS-for-hire service. If there is a quick buck to be made.. they'll be all over it. Human parasites.

                • LegionMammal9784 days ago
                  You normally wouldn't forward open ports on your VM straight through your host and also through your LAN (or at least, I wouldn't), so that's not really a huge attack vector.

                  The main threat would be connecting to a malicious server that attacks some hypothetical hole in the TCP/TLS stack when you connect, but such servers aren't really omnipresent, and you can apply the usual measures of 'making regular backups' and 'not keeping extraordinarily sensitive data on a VM' to mitigate any impacts.

                  (Looking at actual historical holes, I find things like CVE-2005-0048, which requirs a malformed IP packet a modern router wouldn't pass through, and CVE-2007-0069 and CVE-2019-0708, which require a malicious incoming connection to a particular port. There's also stuff like https://www.forcepoint.com/sites/default/files/resources/fil..., but that's not really specific to XP services, and requires many stars to align unless you're running a vulnerable HTTP service.)

    • kilpikaarna6 days ago
      > My question is: Does the author actually use Windows XP?

      Significant overlap between the types of people who use WinXP and write 3D file format importers in C, I think! Though I prefer 7 myself.

  • animal5316 days ago
    This is one of those things where for literally every 3d tool you test it against you're going to find new edge cases that breaks the code.
  • kleiba6 days ago
    Who can do it as a one liner with a regex?
    • creaktive6 days ago
      Been there, done that... Not worth it
  • xyzsparetimexyz6 days ago
    [flagged]
    • jbreckmckye6 days ago
      I did some OBJ processing the other day, it's an easy format for working with PlayStation 1 3D models.
    • creaktive6 days ago
      Says who?
  • maccard6 days ago
    I think this article serves as a perfect example of why we should consider moving on from C. The first third of this article is "how to do memory allocation and work with strings".

    The bit about OBJ parsing is neat, though.

    • writebetterc6 days ago
      Why isn't the conclusion "There is a far better way of using C, which the stdlib doesn't promote... But could"? The fact of the matter is that any sufficiently large C codebase will do this stuff anyway, it's not a language issue.

      Good Rust code will also care about memory allocations to the same degree as the C code, the difference is that Rust will help you out in making sure your thinking is correct. My experience is that good systems programming has thinking about memory allocations not as an annoying side issue, but as a main concern.

    • flohofwoe6 days ago
      If you even remotely care about performance you'll need to take care of such details in any language, and some high level 'managed' languages make that actually harder than C because you need to work around or even against builtin language features.
      • maccard6 days ago
        I've spent my entire career working in C++ writing low level code for video games (and a decent chunk of it writing backend services for said games, and the glue between the two).

        If you want to talk about performance, you better come armed with numbers. If you don't, you're not writing "high performance" code.

    • fsloth6 days ago
      A large part of high performance programming using any language is about memory management.

      For stuff that you run only for yourself and _always_ executes in a blink of an eye I do agree.

      • maccard6 days ago
        I've spent my career ping ponging between writing fast low level code for games, and online systems. If you want to talk about high performance code, benchmarks are a requirement. There's no numbers here. It only talks about "Robust", which OP defines as:

        > By robust I mean no undefined behavior for any input, valid or invalid; no out of bounds accesses, no signed overflows. Input is otherwise not validated. Invalid input may load as valid by chance, which will render as either garbage or nothing.

        Robust is a baseline for high performance programming.

    • tocariimaa6 days ago
      Always when an article of this author gets posted on HN there's a Rust fanatic saying how his code does not work and how wrong he is for committing the sin of using C in current year.
      • maccard6 days ago
        I didn't mention rust. I think he's used enough features of C++ to warrant using it instead.