crates are somewhat better designed than NPM/PyPI (the dist artifacts are source based), but still much worse than Go where there's an intermediate packaging step disconnected from the source of truth.
I think this post has some good information in it, but this is essentially overstated: I look at crate discrepancies pretty often as part of reviewing dependency updates, and >90% of the time it's a single line difference (like a timestamp, hash, or some other shudder between the state of the tree at tag-time and the state at release-time). These are non-ideal from a consistency perspective, but they aren't cause for this degree of alarm -- we do know what the code does, because the discrepancies are often trivial.
Saying that there could be something there, but "no one knows" doesn't mean that there is something there. But it's still true.
Noting that you willfully cut the qualifying "virtually" from that quote, thereby transforming it to over-stated:
> Let me rephrase this, 17% of the most popular Rust packages contain code that virtually nobody knows what it does
We're still thinking in the old mindset, whereas new tools are going to change how all of this is done.
In some years dependencies will undergo various types of automated vetting - bugs (various categories), memory, performance, correctness, etc. We need to think about how to scale this problem instead. We're not ready for it.
You know if you check. Hardly anyone checks. It's just normalization of deviance and will eventually end up with someone exploiting it.
https://vincents.dev/blog/rust-dependencies-scare-me/
It sparked some interesting discussion by lots of the rust maintainers
https://news.ycombinator.com/item?id=43935067
A fat std lib will definitely not solve the problem. I am a proponent of the rust foundation taking packages under their wing and having them audited and funded while keeping original maintainers in tact
Huh, how is this possible? Is the code not pulled from the repository? Why not?
I dug into the linked article, and I would really say this means something closer to 17% of the most popular Rust package versions are either unbuildable or have some weird quirks that make building them not work the way you expect, and not in a remotely reproducible fashion.
https://lawngno.me/blog/2024/06/10/divine-provenance.html
Pulling things into the standard lib is fine if you think everyone should stop using packages entirely, but that doesn't seem like it really does anything to solve the actual problem. There are a number of things it seems like we might be forced to adopt across the board very soon, and for Rust it seems tractable, but I shudder to think about doing it for messier languages like Ruby, Python, Perl, etc.
* Reproducible builds seems like the first thing.
* This means you can't pull in git submodules or anything from the Internet during your build.
* Specifically for the issues in this post, we're going to need proactive security scanners. One thing I could imagine is if a company funnels all their packages through a proxy, you could have a service that goes and attempts to rebuild the package from source, and flags differences. This requires the builds to be remotely reproducible.
* Maybe the latest LLMs like Claude Mythos are smart enough that you don't need reproducible builds, and you can ask some LLM agent workflow to review the discrepancies between the repo and the actual package version.
No, what it means is that the source in crates.io doesn't match 1:1 with any commit sha in their project's repo. This is usually because some gitignored file ended up as part of the distributed package, or poor release practice.
This doesn't mean that the project can't build, or that it is being exploited (but it is a signal to look closer).
It's basically what we're already doing in our OSes (mobile at least), but now it should happen on the level of submodules.
I suppose it can be done on various levels, with various performance trade-offs.
I dunno, I can only listen to Margaritaville so many times in a row.
In Java, the "stdlib" that comes with the JRE, like all the java.* classes, counts 0 towards the size of your particular program but everyone has to have the whole JRE installed to run anything. Whereas if you pull in a (maven) dependency, you get the entirety of the dependency tree in your project (or "uberjar" if you package it that way).
Then we could decide on which of java.util.collections, apache commons-collections, google guava etc. become "standard" ...
Go's stdlib is separate from the language. The language spec doesn't specify a standard library at all. It also doesn't have just one stdlib. tinygo's stdlib isn't the same as gc's, for example.
I will note that gc's standard library also isn't written in Go. It is written in a superset with a 'private' language on top that is tied to the gc compiler to support low-level functions that Go doesn't have constructs for. So separating the standard library from the compiler wouldn't really work. No other Go compiler would be able to make sense of it. go1 promise aside, the higher-level packages that are pure Go could be hoisted completely out of the stdlib, granted.
Vendor your dependencies. Download the source and serve it via your own repository (ex. [1]). For dependencies that you feel should be part of the "Standard Library" (i.e. crates developed by the Rust team but not included into std) don't bother to audit them. For the other sources, read the code and decide if it's safe.
I'm honestly starting to regret not starting a company like 7 years ago where all I do is read OSS code and host libraries I've audited (for a fee to the end-user of course). This was more relevant for USG type work where using code sourced from an American is materially different than code sourced from non-American.
But if you somehow do manage that, then you'll soon have hundreds of outdated vendored dependencies, full of unpatched security issues.
If you host your own internal crates.io mirror, I see two ways to stay on top of security issues that have been fixed upstream. Both involving the use of
cargo audit
which uses the RustSec advisory DB https://rustsec.org/Alternative A) would be to redirect the DNS for crates.io in your company internal DNS server to point at your own mirror, and to have your company servers and laptops/workstations all use your company internal DNS server only. And have the servers and laptops/workstations trust a company controlled CA certificate that issues TLS certificates for “crates.io”. Then cargo and cargo audit would work transparently assuming they use the host CA trust store when validating the TLS certificates when they connect to crates.io. The RustSec DB you use directly from upstream, not even mirroring it and hosting an internal copy. Drawback is if you accidentally leave some servers or laptops/workstations using external DNS, and connections are made to the real crates.io instead. Because then developers end up pulling in versions of deps that have not been audited by the company itself and added to the internal mirror.
Alternative B) that I see is to set up the crates host to use a DNS name under your own control. E.g. crates dot your company internal network DNS name. And then set up cargo audit to use an internally hosted copy of the advisory DB that is always automatically kept up to date but has replaced the cargo registry they are referring to to be your own cargo crates mirror registry. I think that should work. It is already very easy to set up your own crates mirror registry, cargo has excellent support built right into it for using crates registries other than or in addition to crates.io. And then you have a company policy that crates.io is never to be used and you enforce it with automatic scanning of all company repos that checks that no entries in Cargo.toml and Cargo.lock files use crates.io.
It would probably be a good idea even to have separate internal crate registries for crates that are from crates.io and crates that are internal to the company itself. To avoid any name collisions and the likes.
Regardless if going with A) or B), you’d then be able to run cargo audit and see security advisories for all your dependencies, while the dependencies themselves are downloaded from your internal mirror of crates.io crates, and where you audit every package source code before adding it in your internal mirror registry.
Vendoring buys and additional layer of security.
When everyone has Claude Mythos, we can self-audit our supply chain in an automated fashion.