206 pointsby todsacerdoti16 hours ago15 comments
  • kentonv14 hours ago
    A few years back I patched the memory allocator used by the Cloudflare Workers runtime to overwrite all memory with a static byte pattern on free, so that uninitialized allocations contain nothing interesting.

    We expected this to hurt performance, but we were unable to measure any impact in practice.

    Everyone still working in memory-unsafe languages should really just do this IMO. It would have mitigated this Mongo bug.

    • amomchilov9 hours ago
      Recent macOS versions zero out memory on free, which improves the efficacy of memory compression. Apparently it’s a net performance gain in the average case
      • LoganDarkan hour ago
        I wonder if Apple Silicon has hardware acceleration for memory zeroing... Knowing Apple, I wouldn't be surprised.
    • cperciva10 hours ago
      A few years back I patched the memory allocator used by the Cloudflare Workers runtime to overwrite all memory with a static byte pattern on free, so that uninitialized allocations contain nothing interesting.

      Note that many malloc implementations will do this for you given an appropriate environment, e.g. setting MALLOC_CONF to opt.junk=free will do this on FreeBSD.

    • MuffinFlavored10 hours ago
      > OpenBSD uses 0xdb to fill newly allocated memory and 0xdf to fill memory upon being freed. This helps developers catch "use-before-initialization" (seeing 0xdb) and "use-after-free" (seeing 0xdf) bugs quickly.

      Looks like this is the default in OpenBSD.

    • tombert13 hours ago
      You know, I never even considered doing that but it makes sense; whatever overhead that's incurred by doing that static byte pattern is still almost certainly minuscule compared to the overhead of something like a garbage collector.
      • ddtaylor12 hours ago
        IMO the tradeoff that is important here is a few microseconds of time sanitizing the memory saves the millions of dollars of headache when memory unsafe languages fail (which happens regularly)
        • tombert10 hours ago
          I agree. I almost feel like this should be like a flag in `free`. Like if you pass in 1 or something as a second argument (or maybe a `free_safe` function or something), it will automatically `memset` whatever it's freeing with 0's, and then do the normal freeing.
    • dmitrygr13 hours ago
      FYI, at least in C/C++, the compiler is free to throw away assignments to any memory pointed to by a pointer if said pointer is about to be passed to free(), so depending on how you did this, no perf impact could have been because your compiler removed the assignment. This will even affect a call to memset()

      see here: https://godbolt.org/z/rMa8MbYox

      • kentonv10 hours ago
        I patched the free() implementation itself, not the code that calls free().

        I did, of course, test it, and anyway we now run into the "freed memory" pattern regularly when debugging (yes including optimized builds), so it's definitely working.

      • shakna13 hours ago
        However, if you recast to volatile, the compiler will keep it:

            #include <stdlib.h>
            #include <string.h>
        
            void free(void* ptr);
            void not_free(void* ptr);
        
        
            void test_with_free(char* ptr) {
                ptr[5] = 6;
                void *(* volatile memset_v)(void *s, int c, size_t n) = memset;
                memset_v(ptr + 2, 3, 4);
                free(ptr);
            }
        
            void test_with_other_func(char* ptr) {
                ptr[5] = 6;
                void *(* volatile memset_v)(void *s, int c, size_t n) = memset;
                memset_v(ptr + 2, 3, 4);
                not_free(ptr);
            }
        • cperciva10 hours ago
          That code is not guaranteed to work. Declaring memset_v as volatile means that the variable has to be read, but does not imply that the function must be called; the compiler is free to compile the function call as "tmp = memset_v; if (tmp != memset) tmp(...)" relying on its knowledge that in the likely case of equality the call can be optimized away.
          • shakna9 hours ago
            Whilst the C standard doesn't guarantee it, both LLVM and GCC _do_. They have implementation-defined that it will work, so are not free to optimise it away.

            [0] https://llvm.org/docs/LangRef.html#llvm-memset-intrinsics

            [1] https://gitweb.git.savannah.gnu.org/gitweb/?p=gnulib.git;a=b...

            • raverbashing4 hours ago
              Yeah the C committee is wrong here
              • uecker4 hours ago
                I don't see why?

                The C committee gave you memset_explicit. But note that there is still no guarantee that information can not leak. This is generally a very hard problem as information can leak in many different ways as it may have been copied by the compiler. Fully memory safe languages (so "Safe Rust" but not necessarily real-word Rust) would offer a bit more protection by default, but then there are still side-channel issues.

                • raverbashing2 hours ago
                  Because, for the 1384th time, they're pretending they can ignore what the programmer explicitly told them to do

                  Creating memset_explicit won't fix existing code. "Oh but what if maybe" is just cope.

                  If I do memset then free then that's what I want to do

                  And the way things go I won't be surprised if they break memset_explicit for some other BS reason and then make you use memset_explicit_you_really_mean_it_this_time

                  • uecker2 hours ago
                    Your problem is not the C committee but your lack of understanding how optimizing compilers work. WG14 could, of course, specify that a compiler has do exactly what you tell it to do. And in fact, every compiler supports this already: Im most cases even by default! Just do not turn on optimization. But this is not what most people want.

                    Once you accept that optimizing compilers do, well, optimizations, the question is what should be allowed and what not. Both inlining "memset" and eliminating dead stores are both simply optimizations which people generally want.

                    If you want a store not to be eliminated by a compiler, you can make it volatile. The C standard says this can not be deleted by optimizations. The criticism with this was that later undefined behavior could "undo" this by "travelling in time". We made it clear in ISO C23 that this not allowed (and I believe it never was) - against protests from some compiler folks. Compilers still do not fully conform to this, which shows the limited power WG14 has to change reality.

                    • raverbashing10 minutes ago
                      Nope it is the C committee

                      > Once you accept that optimizing compilers do, well, optimizations

                      Why in tarnation it is optimizing out a write to a pointer out before a function that takes said pointer? Imagine it is any other function besides free, see how ridiculous that sounds?

                      • ueckera few seconds ago
                        Because it is a dead store. Removing dead stores does not sound ridiculous to me and neither is it to anybody using an optimizing compiler in the last decades.
        • maxlybbert12 hours ago
          Newer versions of C++ (and C, apparently) have functions so that the cast isn't necessary ( https://en.cppreference.com/w/c/string/byte/memset.html ).
  • plorkyeran14 hours ago
    The author seems to be unaware that Mongo internally develops in a private repo and commits are published later to the public one with https://github.com/google/copybara. All of the confusion around dates is due to this.
    • enether2 hours ago
      I was definitely unaware. I suspected something like this may be up when I talked about the zero-review of the apparent PR "I’m not aware of Mongo’s public review practices". This is great to know though. Updating the piece now to mention this and explain the date discrepancy
  • computerfan49414 hours ago
    The author of this post is incorrect about the timeline. Our Atlas clusters were upgraded days before the CVE was announced.
  • maxrmk16 hours ago
    How often are mongo instances exposed to the internet? I'm more of an SQL person and for those I know it's pretty uncommon, but does happen.
    • petcat15 hours ago
      From my experience, Mongo DB's entire raison d'etre is "laziness".

      * Don't worry about a schema.

      * Don't worry about persistence or durability.

      * Don't worry about reads or writes.

      * Don't worry about connectivity.

      This is basically the entire philosophy, so it's not surprising at all that users would also not worry about basic security.

      • senderista8 hours ago
        To the extent that any of this was ever true, it hasn’t been true for at least a decade. After the WiredTiger acquisition they really got their engineering shit together. You can argue it was several years too late but it did happen.
        • cyberpunk6 hours ago
          I got heavily burned pre-wiredtiger and swore to never use it again. Started a new job which uses it and it’s been… Painless, stable and fast with excellent support and good libraries. They did turn it around for sure.
      • aragilar14 hours ago
        Not only that, but authentication is much harder than it needs to be to set up (and is off by default).
      • winrid14 hours ago
        Although interestingly, for all the mongo deployments I managed, the first time I saw a cluster publicly exposed without SSL was postgres :)
      • morshu90019 hours ago
        I'm sure there are publicly exposed MySQLs too
      • Thaxll11 hours ago
        Most of your points are wrong. Maybe only 1- is valid'ish.
      • ddtaylor12 hours ago
        Ultimate webscale!
    • hahahacorn16 hours ago
      A highly cited reason for using mongo is that people would rather not figure out a schema. (N=3/3 for “serious” orgs I know using mongo).

      That sort of inclination to push off doing the right thing now to save yourself a headache down the line probably overlaps with “let’s just make the db publicly exposed” instead of doing the work of setting up an internal network to save yourself a headache down the line.

      • matwood4 hours ago
        > A highly cited reason for using mongo is that people would rather not figure out a schema.

        Which is such a cop out, because there is always a schema. The only questions are whether it is designed, documented, and where it's implemented. Mongo requires some very explicit schema decisions, otherwise performance will quickly degrade.

        • xnorswap2 hours ago
          Fowler describes it as Implicit vs Explicit schema, which feels right.

          Kleppmann chooses "schema-on-read" vs "schema-on-write" for the same concept, which I find harder to grasp mentally, but describes when schema validation need occur.

      • TZubiri15 hours ago
        I would have hoped that there would be no important data in mongoDB.

        But now we can at least be rest assured that the important data in mongoDB is just very hard to read with the lack of schemas.

        Probably all of that nasty "schema" work and tech debt will finally be done by hackers trying to make use of that information.

        • bostik2 hours ago
          There is a surprising amount of important data in various Mongo instances around the world. Particularly within high finance, with multi-TB setups sprouting up here and there.

          I suspect that this is in part due to historical inertia and exposure to SecDB designs.[0] Financial instruments can be hideously complex and they certainly are ever-evolving, so I can imagine a fixed schema for essentially constantly shifting time series universe would be challenging. When financial institutions began to adopt the SecDB model, MongoDB was available as a high-volume, "schemaless" KV store, with a reasonably good scaling story.

          Combine that with the relatively incestuous nature of finance (they tend to poach and hire from within their own ranks), the average tenure of an engineer in one organisation being less than 4 years and you have an osmotic process of spreading "this at least works in this type of environment" knowledge. Add the naturally risk-averse nature of finance[ß] and you can see how one successful early adoption will quickly proliferate across the industry.

          0: This was discussed at HN back in the day too: https://calpaterson.com/bank-python.html

          ß: For an industry that loves to take financial risks - with other people's money of course, they're not stupid - the players in high finance are remarkably risk-averse when it comes to technology choices. Experimentation with something new and unknown carries a potentially unbounded downside with limited, slowly emerging upside.

        • saghm9 hours ago
          I'd argue that there's a schema; it's just defined dynamically by the queries themselves. Given how much of the industry seems fine with dynamic typing in languages, it's always been weird to me how diehard people seem to be about this with databases. There have been plenty of legitimate reasons to be skeptical of mongodb over the years (especially in the early days), but this one really isn't any more of a big deal than using Python or JavaScript.
          • jeltz34 minutes ago
            As someone who has done a lot of Ruby coding I would say using a statically typed database is almost a must when using a dynamically type language. The database enforces the data model and the Ruby code was mostly just glue on top of that data model.
          • morshu90019 hours ago
            Yes there's a schema, but it's hard to maintain. You end up with 200 separate code locations rechecking that the data is in the expected shape. I've had to fix too many such messes at work after a project grinded to a halt. Ironically some people will do schemaless but use a statically typed lang for regular backend code, which doesn't buy you much. I'd totally do dynamic there. But DB schema is so little effort for the strong foundation it sets for your code.

            Sometimes it comes from a misconception that your schema should never have to change as features are added, and so you need to cover all cases with 1-2 omni tables. Often named "node" and "edge."

            • matwood4 hours ago
              The adage I always tell people is that in any successful system, the data will far outlive the code. People throw away front ends and middle layers all the time. This becomes so much harder to do if the schema is defined across a sprawling middle layer like you describe.
            • cyberpunk5 hours ago
              We just sit a data persistence service infront of mongo and so we can enforce some controls for everything there if we need them, but quite often we don’t.

              It’s probably better to check what you’re working on than blindly assuming this thing you’ve gotten from somewhere is the right shape anyway.

          • TZubiri4 hours ago
            What's weird to me is when dynamic typers don't acknowledge the tradeoff of quality vs upfront work.

            I never said mongodb was wrong in that post, I just said it accumulated tech debt.

            Let's stop feeling attacked over the negatives of tradeoffs

        • bigbuppo3 hours ago
          Whatever horrors there are with mongo, it's still better than the shitshow that is Zope's ZODB.
    • bschmidt10797910 hours ago
      Are you guys serious with these takes?

      You very often have both NoSQL and SQL at scale.

      NoSQL is used for high availability of data at scale - iMessage famously uses it for message threads, EA famously uses it for gaming matchmaking.

      What you do is have both SQL and NoSQL. The NoSQL is basically caches of resources for high availability. Imagine you are making a social media app... Yes of course you have a SQL database that stores all the data, but you maintain API caches of posts in NoSQL.

      Why? This gets to some of your other black vs white insults: NoSQL is typically WAY FASTER than SQL. That's why you use it. It's way faster to read a JSON file from a hard drive than it is to query a SQL database, always has been. So why not use NoSQL for EVERYTHING? Well, because you have duplicated data everywhere since it's not relational, it's just giant caches essentially. You also will get slow queries when the documents get huge.

      Anyway you need both. It's not an either/or thing. I cannot believe this many years later people do not know the purpose of SQL and NoSQL and do not understand that it is not a competition at all. You want both!

      • ch20267 hours ago
        Because nobody uses mongo for the reasons you listed. They use redis, dynamo, scylla or any number of enriched KV stores.

        Mongo has spent its entire existence pretending to be a SQL database by poorly reinventing everything you get for free in postgres or mysql or cockroach.

      • Capricorn248110 hours ago
        What they wrote was pretty benign. They just asked how common it is for Mongo to be exposed. You seem to have taken that as a completely different statement
        • bschmidt10797910 hours ago
          I mean they said it's rarely used when in fact it's widely used by some of the world's biggest companies at the highest scale the internet knows. The other guy had a harsher comment sure, maybe I should duplicate my reply to them, but who knows what kinds of rules that breaks on this site lmao Happy Christmas & New Year buddy!
    • wood_spirit16 hours ago
      The article links to a shodan scan reporting 213K exposed instances https://www.shodan.io/search?query=Product%3A%22MongoDB%22
    • acheong0810 hours ago
      My university has one exposed to the internet, and it's still not patched. Everyone is on holiday and I have no idea who to contact.
      • heavyset_go8 hours ago
        No one, if you aren't in the administration's good graces and something shitty happens unrelated to you, you've put a target on your back to be suspect #1.
      • bschmidt1079799 hours ago
        "Look at me. I'm the DBA now"

        -JS devs after "Signing In With Facebook" to MongoDB Atlas

        AKA me

        Sorry guys, I broke it

    • ddtaylor12 hours ago
      It could be because when you leave an SQL server exposed it often turns into much worse things. For example, without additional configuration, PostgreSQL will default into a configuration that can own the entire host machine. There is probably some obscure feature that allows system process management, uploading a shell script or something else that isn't disabled by default.

      The end result is "everyone" kind of knows that if you put a PostgreSQL instance up publicly facing without a password or with a weak/default password, it will be popped in minutes and you'll find out about it because the attackers are lazy and just running crypto-mine malware, etc.

    • ok12345614 hours ago
      For a long time, the default install had it binding to all interfaces and with authentication disabled.
    • notepad0x9013 hours ago
      often. lots of data leaks happened because of this. people spin it up in a cloud vm and forget it has a public ip all the time.
  • exabrial12 hours ago
    Why is anyone using mongo for literally anything
    • nine_k10 hours ago
      Easy replication. I suppose it's faster than Postgres's JSONB, too.

      I would rather not use it, but I see that there are legitimate cases where MongoDB or DynamoDB is a technically appropriate choice.

    • mickael-kerjean11 hours ago
      because it is "web scale"

      ref: https://www.youtube.com/watch?v=b2F-DItXtZs

      • DonHopkins7 hours ago
        Whenever anyone writes about mongodb or redis I hear it in that voice.
    • gethly4 hours ago
      Right? When they came out, it was all about NoSQL, which then turned out only mean key-value database, whom are plentiful.
    • Aldipower11 hours ago
      This is a nasty ad repositorium datorum argumentation which I cannot tolerate.
  • netsharc11 hours ago
    > On Dec 24th, MongoDB reported they have no evidence of anybody exploiting the CVE

    Absence of evidence is not evidence of absence...

    • forrestthewoods11 hours ago
      What would you prefer them to say?
      • perching_aix10 hours ago
        Evidence of no exploitations? It's usually hard to prove a negative, except when you have all the logs at your fingertips you can sift through. Unless they don't, of course. In which case the point stands: they don't actually know at this point in time, if they can even know about it at all.

        Specifically, it looks like the exflitration primitive relies on errors being emitted, and those errors are what leak the data. They're also rather characteristic. One wouldn't reasonably expect MongoDB to hold onto all raw traffic data flowing in and out, but would absolutely expect them to have the error logs, at least for some time back.

        • saghm9 hours ago
          I feel like that's an issue not with what they said, but what they did. It would be better for them to have checked this quickly, but it would have been worse for them to have they did when they hadn't. What you're saying isn't wrong, but it's not really an answer to the question you're replying to.
        • forrestthewoods9 hours ago
          “No evidence of exploitation” is a pretty bog standard report I think? Made on Christmas Eve no less.

          Do other CVE reports come with more strong statements? I’m not sure they do. But maybe you can provide some counter examples that meet your bar.

          • dwattttt2 hours ago
            > "No evidence of exploitation” is a pretty bog standard report

            It is standard, yes. The problem with it as a statement is that it's true even if you've collected exactly zero evidence. I can say I don't have evidence of anyone being exploited, and it's definitely true.

          • perching_aix9 hours ago
            It's not really my bar, I just explored this on behalf of the person you were replying to because I found it mildly interesting.

            It is also a pretty standard response indeed. But now that it was highlighted, maybe it does deserve some scrutiny? Or is saying silly, possibly misleading things okay if that's what everyone has always been doing?

  • whynotmaybe15 hours ago
    I'm still thinking about the hypothetical optimism brought by OWASP top 10 hoping that major flaws will be solved and that buffer overflow has been there since the beginning... in 2003.
    • thrwaway5513 hours ago
      I mean giving everyone footguns and you'll find that is unavoidable forever. Thoughts and prayers to the Mongo devs until we migrate to a language that prevents this error.
  • bschmidt10797910 hours ago
    Every time someone posts about NoSQL a thousand "programmers" reveal they have never had to support a lot of traffic lol
    • jeltz31 minutes ago
      Nah, this time it was just you.
  • 13 hours ago
    undefined
  • vivzkestrel10 hours ago
    is it true that ubisoft got hacked and 900GB of data from their database was leaked due to mongobleed, i am seeing a lot of posts on social media under the #ubisoft tags today. can someone on HN confirm?
    • christophilus9 hours ago
      I read that hack was made possible by Ubisoft’s support staff taking bribes.
    • bschmidt1079799 hours ago
      TLDR: Blame logs not NoSQL.

      Almost always when you hear about emails or payment info leaking (or when Twitter stored passwords in plaintext lol) it's from logs. And a lot of times logs are in NoSQL because it is only ever needed in that same JSON format and in a very highly available way (all you Heroku users tailing logs all day, yw) and then almost nobody encrypts phone numbers and emails etc. whenever those end up in logs.

      There's basically no security around logs actually. They're just like snapshots of the backend data being sent around and nobody ever cares about it.

      Anyway it has nothing to do with the choice to use NoSQL, it has more to do with how neglected security is around it.

      Btw in case you are wondering in both the Twitter plaintext password case and in the Rainbow Six Siege data leak you mention were both logs that leaked. NoSQL backed logs sure, but it's more about the data security around logging IMO.

  • dwheeler9 hours ago
    This has many similarities to the Heartbleed vulnerability: it involves trusting lengths from an attacker, leading to unauthorized revelation of data.
  • petesergeant10 hours ago
    > In C/C++, this doesn’t happen. When you allocate memory via `malloc()`, you get whatever was previously there.

    What would break if the compiler zero'd it first? Do programs rely on malloc() giving them the data that was there before?

    • mdavid6266 hours ago
      It takes time to zero out memory.
    • pelorat3 hours ago
      That's what calloc() is for
  • fwip10 hours ago
    "MongoBleed Explained by an LLM"
    • tuetuopay2 hours ago
      If it is, it's less fluffy and empty than most of LLM prose we're usually fed. It's well explained and has enough details to not be overwhelming.

      Honestly, aside from the "<emoji> impact" section that really has an LLM smell (but remember that some people legit do this since it's in the llm training corpus), this more feels like LLM assisted (translated? reworded? grammar-checked?) that pure "explain this" prompt.

      • enetheran hour ago
        I didn't use AI in writing the post.

        I did some research with it, and used it to help create the ASCII art a bit. That's about it.

        I was afraid that adding the emoji would trigger someone to think it's AI.

        In any case, nowadays I basically always get at least one comment calling me an AI on a post that's relatively popular. I assume it's more a sign of the times than the writing...

  • reassess_blind11 hours ago
    Have all Atlas clusters been auto-updated with a fix?
    • enetheran hour ago
      yes. apparently before Dec 19 too