“ with 100 direct dependencies and 647 dependencies in total”
Next up: watch me build numpy from scratch with only 150 dependencies, one of which numpy.
Briefly, you need to manage metadata for the database. You can write your own raft based solution or leverage existing software like etcd or zookeeper that may not "a relational database". Now you need to deploy them with EBS and reimplement data replication + multi AZ fault tolerance, and it's likely still worse performance than RDS because first-class RDS can typically use internal storage API and advanced hardware. Such a scenario is not software driven.
https://flex-ninja.medium.com/from-shared-nothing-to-shared-...
Tools like `cargo audit` can tell you statically based on the lockfile which dependencies have security vulnerabilities reported against them (but you have to run it!). And Github's https://github.com/dependabot/ will do that same thing automatically, just based on the existence of the lockfile in your repo (and will also open PRs to bump deps for you).
And as mentioned elsewhere: Cargo's dependency resolver supports providing multiple versions of a dep in different dependency subgraphs, which all but eliminates the "dependency hell" that folks expect from ecosystems like Python or the JVM. Two copies of a dep at different versions? Totally fine.
Tools based on loading libraries from a *PATH (Go, Python, JVM) usually do so by grabbing the first one that they encounter that contains the appropriate symbols. That is incompatible with having multiple versions of a package.
On the other hand, Rust and node.js support this -- each in their own way. In Rust, artifact names are transparently suffixed with a hash to prevent collisions. And in node.js, almost all symbol lookups are accomplished with relative filesystem paths.
known security vulnerabilities. If someone compromises your cargo repository (see npm for examples) all your safety is gone.
Yes. No dependencies is so 80's. Just run an ldd on your commonly used programs.
I comment, in a Chromium[1] tab, running on my Ubuntu[2] box.
[1] https://github.com/chromium/chromium/blob/main/.gitmodules
[2] https://releases.ubuntu.com/24.04/ubuntu-24.04.1-desktop-amd...
But if you compare to C/C++ at least with Rust you _can_ but aren't required to use dependencies. In C/C++ if you want to, it's a _massive_ pain.
Isn’t the best argument for open source code that it has so many people, most companies can not afford such a global quality assurance.
Perplexingly, the original commenter seems to understand that this doesn't matter, and then handwaves away the correct conclusion.
I'd like to be able to pick a few libraries without incurring a huge ongoing audit burden. If I have to exclude many popular libraries because they have oodles of dependencies, that both makes searching more laborious and limits my choices.
Yes, knowing that would be helpful!
Is there a way to whitelist owners/publishers in Cargo?
There is really just a handful of crates that nearly often get pulled in and probably like 5 authors across them.
Supply chain harderning is pretty easy in rust: caro-deny, cargo-suply-chain, cargo-crev, cargo-vet, cargo-{s}bom and probably a few more I can't remember.
This starts from explaining outright which dependencies they have and why.
It's not so much direct dependencies that bother me: it's an exponential explosion of transitive dependencies.
Also, seeing an “end product” with dozens of dependencies doesn't bother me much; a library does.
So how many dependencies are there truly when you peel away the first layer of the onion?
The obvious answer is "N crates is N dependencies", because each crate represents a discrete sequence of atomic software release packages.
In the absence of a standardized mechanism to group crates together, we have to fall back to informal methods, like "I know all these authors personally because I'm an insider", or "these crates seem to be related even though I'm unsure how to guarantee they'll stay that way".
You can take a hard line and insist that nobody should run a single line of code they haven't reviewed, but that severely constrains the ability of a typical org to use the wider ecosystem at all. Not every org has the expertise on staff to pore over diverse Rust code and confidently state that it has no issues, and even those that do have to consider whether paying that cost is good risk management.
It would be nice if there was a more reliable way to simplify the evaluation of publisher trust centers, especially for orgs who aren't going to audit code but don't want to blindly take in anything.
What's the status of potential distributed code review systems like cargo-crev?
No, and I think this is the crucial thing that people who have experience with NPM overlook when it comes to Rust. Rust emphatically does not have a culture of single-function microlibraries, instead libraries are split out by purpose, in the same way you would modularize a C codebase.
Remember, Rust crates are not just units of distribution, they are also units of translation (a.k.a. compilation units), so the same pressures that cause people to split C projects into multiple files results in people splitting Rust projects into multiple crates.
Distributed code review is a brute-force style solution. Republishing collections of crates under a single name/version is a dimensionality-reduction and responsibility-concentration style solution. I suspect pure-PR style solutions will be ineffective. What other kind of solutions are there?
It is not as simple as you say. Sometime it is better to know all of you dependencies are static linked at build time and specified when you are releasing your code. And the more sane you build system is the harder it is to add shellcode to your dependency's tarball and build scripts without turning peoples heads with random unsafe code.
If the xz backdoor had not been found due to dumb luck, it could have persisted for a long time. Backdoors have persisted for years before, maybe even decades. It's also a package with a lot of eyes on it compared to obscure packages. So I don't think you're right even a little bit, especially in huge projects or projects with LOTS of dependencies.
As someone who works in cybersecurity and works closely with our developers, a lot of them tend to inherently trust third-party code with no auditing of the supply chain. I am always fighting that while yes, we don't need to reinvent the wheel and libraries/packages are important, our organzation and developers need to be aware of what we are bringing into our network and our codebase.
This is how I think it should be of course. Like I said, I'm not against the use of third-party code or dependencies, I'm against using them without performing any audit of that code.
The amount of code you have to review stays the same.
[edit] typos
Lots of packages have a `-macros` or `-derive` transient dependency, meaning a single dependency can end up coutning as 3 additional dependencies.
Rust makes it simple to split packages into workspaces - for example, regex[1] consists of `regex-automata` and `regex-syntax` packages.
This composition and separation of concerns is a sign of good design, and not an npm-esque hellhole.
This is assuming that the audit consists of validating dependency authorship, and not the more labor-intensive approach of reviewing dependency code.
That’s… the whole rationale about not liking lots of small packages.
Either way, with rust it's a handful of authors, but just because they are proven to be good faith actors, doesn't mean trust in their code is implied when we're talking about supply chain hardening.
(I would genuinely be interesting in an experiment which pushes this as far as possible: what if each function was a 'package'? There's already a decent understanding of how dependencies within a library work in the compiler, what if that extended to the package manager? You would know exactly what code you actually needed, and would only pull in exactly what was necessary)
e.g. go dependencies are counted on modules (roughly git repos), rather than packages (directories, compilation units). java is counted in packages rather than classes.
A non-obvious issue is that database engines have peculiar requirements for how libraries are designed and implemented which almost no conventional library satisfies. To make matters worse, two different database implementations may have different requirements in this regard, so you can't even share libraries between databases. There are no black boxes in good database engines.
Looking at the dependencies list (https://gist.github.com/tisonkun/06550d2dcd9cf6551887ee6305e...) I see plenty of reasonable things like:
* Base64/checksum/compression encoding libraries
* Encryption/hash libraries
* Platform-specific bindings (likely conditional dependencies)
* Bit hacking/casting/zero-copy libraries like bytemuck, zerocopy, zero-vec, etc.
* "Small"/stack allocated data structure libraries (smallvec, tinystr, etc.)
* Unicode libraries
There are certainly things that would add bloat too, but I think it's silly to pretend like everything here is something a database engine would need custom implementations of.
Combine this with the challenge of implementations being async, non-allocating, compatible with explicitly paged memory, etc and it generally becomes worth the effort.
You'll find more libraries used at the periphery for integration and compatibility where it matters less but not in the core.
I'd rather an author pulls in a tinyvec/serde than tries to make a bespoke implementation.
That’s fine, I have not objection to hot takes, but don’t conflate your limited experience with reality.
I get it that you work on a geospatial database and understand this field better than I do, but why so rude and dismissive? Work on that.
In the case where the answer is "zero", then that means that one does not actually need a package manager at all, in which case the features of the package manager are not relevant to the choice of language. This would imply that the parent commenter has no need to reject Rust.
TBF this has nothing to do with dependency complexity and everything to do with semantic complexity. You could easily do this without using any dependencies at all.
unless you're downloading dependencies during the build or something like that, of course.
With over 600 dependencies, the probability goes up and up.
This comment makes it seem like all this company does is take, which feels unfair to me
They say they do when suitable (never or rarely).
But that's fine as the licenses allow it. It feels like another company blogging about how great open source to get pr while close sourcing their product.
The older I get the more I understand why gpl variations are superior to bsd if you want to grow the software. Bsd are good for throw away code or standards you want others to adopt.
Profit isn't far removed from theft, so maybe this shouldn't feel so unfair.
I definitely think there are unethical ways to profit - capitalism needs to be regulated for the good of the consumer/ecosystem/society.
However, I don't believe that a blanket comparison of any type of profit to theft can be useful or correct.
> so maybe this shouldn't feel so unfair
Do you think this company is unethical for writing closed source software and trying to sell it?
[1] https://www.tisonkun.org/2024/11/17/open-source-supply-chain...
This section is moved to the second-to-last section in the posted blog, including:
[QUOTE]
When you read The Cathedral & the Bazaar, for its Chapter 4, The Magic Cauldron, it writes:
> … the only rational reasons you might want them to be closed is if you want to sell the package to other people, or deny its use to competitors. [“Reasons for Closing Source”]
> Open source makes it rather difficult to capture direct sale value from software. [“Why Sale Value is Problematic”]
While the article focuses on when open-source is a good choice, these sentences imply that it’s reasonable to keep your commercial software private and proprietary.
We follow it and run a business to sustain the engineering effort. We keep ScopeDB private and proprietary, while we actively get involved and contribute back to the open-source dependencies, open source common libraries when it’s suitable, and maintain the open-source twin to share the engineering experience.
[QUOTE END]
I wrote other blogs to analyze open-source factors within commercial software[2][3][4][5], and I have practiced them in several companies as well as earned merits in open-source projects.
When you think about it, there are many developers working for their employers, and using open-source software in their $DAYJOB is a good motivation to contribute more (especially for distributed systems; individuals can seldomly need one). I know there is open-source developers who develop software that has nothing to do with their $DAYJOB. I'm maintaining projects that has nothing to do with my $DAYJOB also (check Apache Curator, the Java binding of Apache OpenDAL, and more).
[1] https://www.tisonkun.org/2025/01/15/open-source-twin/
(Need a translator) [2] https://www.tisonkun.org/2022/10/04/bait-and-switch-fauxpen-...
[3] https://www.tisonkun.org/2023/08/12/bsl/
[4] https://www.tisonkun.org/2022/12/17/enterprise-choose-a-soft...
[5] https://www.tisonkun.org/2023/02/15/business-source-license/
I mean that's been the prevalent attitude for the entire history of open source. Its easy to laugh until someone replaces you.
But I think it’s easy for people to criticize dependencies from afar without understanding what they’re used for. I’m sure the dependencies in my projects would look strange to others - for example, I use three HTTP libraries: one for 95% of cases and the others for very specific use-cases where I need control at a low level. But without that context it might seem excessive.
It may not be named "database" but actually take the place of a database.
Observability vendors will try to store logs with ElasticSearch and later find it over expensive and has weak support for archiving cold data. Data Warehouse solution requires a complex ETL pipeline and can be awkward when handling log data (semi-structured data).
That said, if you're building an observability solution for a single company, I'd totally agree to start with single node PG with backup, and only consider other solution when data and query workload grow.
- 85 are related to gix (a Rust reimplementation of git, 53 of those are gix itself, that project is unfortunately infamous for splitting things into crates that probably should've been modules)
- 91 are related to pgp and all the complexity it involves (aes with various cipher modes, des, dsa, ecdsa, ed25519, p256, p384, p521, rsa, sha3, sha2, sha1, md5, blowfish, camellia, cast5, ripemd, pkcs8, pkcs1, pem, sec1, ...)
- 71 are related to http/irc/tokio (this includes a memory-safe tls implementation, an http stack like percent-encoding, mime, chunked encoding, ...)
- 26 are related to the winapi (which I don't use myself, but are still part of the resolved dependency graph)
- 8 are related to web assembly (unused when compiling for Linux)
- 2 are relatd to android (also unused when compiling for Linux)
In some ways this is a reminder of how much complexity we're building on top of for the sake of compatibility.Also keep in mind "reviewing 100 lines of code in 1 library" and "reviewing 100 lines of code split into 2 libraries" is still pretty much the same amount of code (if any of us actually reviewed all their dependencies). You might even have a better time reviewing the sha2 crate vs the entirety of libcrypto.so, if that's all you needed.
My project has been around for (almost) two years, I scanned every commit for vulnerable dependencies using this command:
for commit in $(git log --all --pretty='%H'); do git show "$commit":Cargo.lock > Cargo.lock && cargo audit -n --json | jq -r '.vulnerabilities.list[] | (.advisory.id + " - " + .package.name)'; done | sort | uniq
I got a total of 25 advisories (basically what you would be exposed to if you ran all binaries from every single commit simultaneously today). Here's the list: RUSTSEC-2020-0071 - time
RUSTSEC-2023-0018 - remove_dir_all
RUSTSEC-2023-0034 - h2
RUSTSEC-2023-0038 - sequoia-openpgp
RUSTSEC-2023-0039 - buffered-reader
RUSTSEC-2023-0052 - webpki
RUSTSEC-2023-0053 - rustls-webpki
RUSTSEC-2023-0071 - rsa
RUSTSEC-2024-0003 - h2
RUSTSEC-2024-0006 - shlex
RUSTSEC-2024-0019 - mio
RUSTSEC-2024-0332 - h2
RUSTSEC-2024-0336 - rustls
RUSTSEC-2024-0345 - sequoia-openpgp
RUSTSEC-2024-0348 - gix-index
RUSTSEC-2024-0349 - gix-worktree
RUSTSEC-2024-0350 - gix-fs
RUSTSEC-2024-0351 - gix-ref
RUSTSEC-2024-0352 - gix-index
RUSTSEC-2024-0353 - gix-worktree
RUSTSEC-2024-0355 - gix-path
RUSTSEC-2024-0367 - gix-path
RUSTSEC-2024-0371 - gix-path
RUSTSEC-2024-0373 - quinn-proto
RUSTSEC-2024-0421 - idna
I guess I'm doing fine. Keep in mind, the binary is fully self-contained, there is no "look, my program has zero dependencies, but I need to ship an entire implementation of the gnu operating system along with it".Running cargo audit -n --json | jq -r '.vulnerabilities.list[] | (.advisory.id + " - " + .package.name)' gives:
RUSTSEC-2023-0071 - rsa
which is transitively introduced by sqlx-mysql while we don't use the MySQL driver in production.
to be fair, python pkg dependency are fine to me,there might be a lot of pip pkgs still,but not a few hundreds like npm and cargo normally pulls in.
golang also has a reasonable amount of dependencies. npm and cargo dependencies are just scary due to the huge number.
In rust, you can project A can use dependencies B and C which can both depend on different versions of D. Cargo/crates generally also solve some of the other metadata problems Python has.
This means the developer experience is _significantly_ improved, at a potential cost of larger binaries. In practice, projects seem to have sufficiently liberal bounds that duplication isn't an issue.
With the lines of code, not the number of dependencies. 10 dependencies of 100 lines of code are arguably easier, but certainly not harder than a single dependency of 1000 lines of code.
This returns us to status quo ante, back before supply chain attacks were something we worried about. Bugs and such from dependencies are an annoyance but a manageable problem. Supply chain attacks after publisher account compromise are catastrophic and are not manageable.
What does this mean?
It means you'll trust the random people pushing code to cargo if you can prove they indeed are the random people they claim to be?
Afterwards, it suffices to validate with each dependency update that the publisher is the same publisher that was evaluated before.
Go did something nice, and it would be good if more people copied. But it was also fairly recent.
I understand why they do it. It's lead to some amazing crates like serde. But I think I fall more in the camp of Python, Go or Odin with a comprehensive standard lib. You can make a whole game with Odin with standard library only. Or an entire web app in Go.
Regex is not a third-party dependency:
the thin standard library and flat package namespace encourages land grabs for short memorable names for packages that just do a single thing. compared to say java or go where dependencies don't exist because they sound cool but because they solve a real problem.
When pointing out a problem, you don't necessarily need to provide a better solution. However, if you refrain from providing a better solution, you are still implicitly asserting that there exists some better solution.
So then it's possible to counter that with: a better solution may not exist. If you think a better solution does exist, then the burden of proof is on you to point out an existing solution that does better, or to otherwise establish that some better solution must exist.
Rust could very well be at a global optimum for the problems it's trying to solve. Sometimes tradeoffs are just inevitable.
Some problems are hard to solve. But not all of them are.
An example, and this is an observation. Where can I grab a library that just parses parses HTTP 1.0, HTTP 1.1 and HTTP 2.0 messages. Not a HTTP framework, something along the lines of httpparse. I pass it a buffer of bytes and out the other end pops a Result<error, HTTPResponse>.
Sure there are hard problems to solve there but they are API design problems.
I don't need tokio or some sort of web abstraction. If I want to use HTTP as a transport over carrier pidgeon I want to be able to do that with said library.
Doesn't exactly exist though and because everyone just pulls in Tokio( I do mean the entire Tower stack or whatever it's called these says) nobody even notices the issue. And every single HTTP server rewrites that functionality with slightly different edge cases and bugs.
That's basic internet infrastructure right there and we can't get a conical library for that yet you are arguing that 3 different implementations of the aame hash function pulled into the same project is viable?
that said, design choices like a flat package namespace are inexcusable. even npm started to move away from it.
The thrill of complexity is real.
The full list is linked in the article https://gist.github.com/tisonkun/06550d2dcd9cf6551887ee6305e...
There isn't a single thing there that seems iffy to me. Rust projects split themselves into as small of a crate as possible to 1) ease their own development, 2) improve compile times to make their compilation trivially parallelizable, and 3) allow for reuse. Because of this, you can easily end up with a dozen crates all written by the same group of people, meant to be used together. If a project is a single big crate, or a dozen small crates, you're on the exact same situation. If you wouldn't audit the small crates because they are a lot, you wouldn't audit the big crate thoroughly either.
But what about transitive dependencies? Similar thing: if you have a crate to check for the terminal width, I prefer to take the existing small crate than copy paste its code. I can do the latter, but then you end up with effectively a vendored library in your code that no tool can know about to warn you when a security vulnerability has happened.
You mean like four versions of hashbrown (which is useful, but it's rare to have to use it directly instead of `std::collections::HashMap`, never mind pulling four versions of it into your project) or four versions of itertools (which is extremely situational, and even when it is useful it usually only saves you a couple of lines of code, so it's essentially never worth pulling it once, never mind four times)? Or maybe three different crates for random number generation (rand, nanorand, fastrand)?
There's a definitely problem with how the Rust community approaches dependencies (and I say this as someone who loves Rust and uses it as their main language for 10+ years now). People are just way too trigger happy with external dependencies, and burying our heads in the sand is not helping.
Inclusion of every external dependency should always be well motivated. How big is the dependency? How much of it do we use? How big of an effect will it have on compile times? How much effort would it be to write it yourself? Is it security sensitive? Is it a dependency which everyone uses and is maintained by well known community members, or some random guy from who knows where? And so on.
For example, cryptography stuff? No, don't write that yourself if you're not an expert; you'll get it wrong and expose yourself to vulnerabilities. Removing leading whitespace from strings? ("unindent" crate, which is also on your list) Hell no! That's like a minute or two to write this yourself. Did we learn nothing from the left-pad incident?
The two options for cargo here are 1) fail to compile when there's more than one crate-version in the dep tree or 2) allow for there to be more than one and let the project continue compiling. The former would be more "principled" but in practice incredibly disruptive. I usually go "dep hunting" to unify the versions of duplicated deps. Most of the time that's just looking at `cargo tree` and modifying the `Cargo.toml` slightly. Other times it's not easy, and have to either patch or (better) wait until the diverging dep updates their own `Cargo.toml`.
> People are just way too trigger happy with external dependencies, and burying our heads in the sand is not helping.
>
> Inclusion of every external dependency should always be well motivated. How big is the dependency? How much of it do we use? How big of an effect will it have on compile times? How much effort would it be to write it yourself? Is it security sensitive? Is it a dependency which everyone uses and is maintained by well known community members, or some random guy from who knows where? And so on.
We can have a nuanced discussion about dependencies. That's not what I was seeing. There are plenty of things that can be done to improve the situation, specially around Supply Chain Security, but this idea that dependency count is the issue is misguided. It pushes projects towards copy-pasting and vendoring. That makes that code opaque to security tools, existing or proposed. Think of the shitshow it is if you have an app and decided "more dependencies is bad, so I'm copying xz into my repo"?
> Removing leading whitespace from strings? ("unindent" crate, which is also on your list) Hell no! That's like a minute or two to write this yourself.
I don't have access to the closed-source repo to run `cargo tree` to see where `unindent` is used from, but why do you feel this is an invalid crate to pull in? It is a proc-macro, that deindents string literals at compile time. Would I include it directly in a project of mine? Likely not, but if I were using `indoc` (written by dtolnay), which uses `unindent` (written by dtolnay) my reaction wouldn't be "oh, no! An additional useless dependency!".
Each additional dependency imposes an ongoing audit burden on the downstream consumers of your project.
In an era supply chain compromises are increasing and the consequences are catastrophic, the security story alters the traditional balance of "roll your own" versus "use the shared library".
Well, partially you're right. There are roughly two things which are important here:
1) The number of unique authors/entities controlling the dependencies. (So 10 crates by exactly same author would still count as one dependency.)
2) The amount of code pulled in by a crate. (Because this tanks your compile times; I've seen projects pulling in hundreds of thousands of lines of code in external dependencies and using less that 1% of that, and then people make surprised pikachu face that Rust is slow to compile.)
> I don't have access to the closed-source repo to run `cargo tree` to see where `unindent` is used from, but why do you feel this is an invalid crate to pull in? It is a proc-macro, that deindents string literals at compile time. Would I include it directly in a project of mine? Likely not, but if I were using `indoc` (written by dtolnay), which uses `unindent` (written by dtolnay) my reaction wouldn't be "oh, no! An additional useless dependency!".
I would never include either in any of my projects, and would veto any attempt to do so. As I already said, the 'unindent' crate is trivial to write by myself, and the 'indoc' crate seems completely not worth it from a cost/benefit standpoint in the very rare case I'd need something like that (it's easy enough to make do without it, as it's just a minor situational quality of life crate).
In general my policy on external dependencies is stricter than most people; I usually only include high value/high impact dependencies, and I try to evaluate whether a given dependency is appropriate in context of the concrete project I want to use it in. If it's a throwaway script that I need to run once and won't really maintain long-term - I go crazy with gluing whatever external crates there are just to get it done ASAP! But if it's a project that I'll need to maintain over a long period of time I get a lot more strict, and if it's a library that I expect other people to use then the bar for external dependencies gets even higher (because any extra dependency I add will bloat up the compile times and the dependency trees of any downstream users).
I also find it helpful to ask myself the question - if it wasn't easy to add new dependencies (e.g. if I was still writing in C++, or cargo wasn't a thing) would I still include this dependency in my project? If the answer is "no" then maybe it's better not to.
There are some notable exceptions, but sadly most of the Rust community doesn't do things this way.
from empirical studies, we know the first kind occurs at roughly the same rate everywhere, so it's just do you have capacity to fix it. also, reusable dependencies typically are more configurable which leads to more code and more bugs, many of which might not have affected you if you didn't need all the flexibility.
dependency count is an indirect measure of the second kind, except rust pushes crates as the primary metric, so it will always look bad compared to if it pushed something more reasonable like the number of trust domains.
The dependencies are modular, not diffuse.
I think people saw the title, and got triggered into hate. When actually, this seems author-submitted, and they were probably just trying to be humble about their accomplishment. It's not even the title of the article.
Thanks for your reply. To be honest, I simply recognize that depending on open-source software a trivial choice. Any non-trivial Rust project can pull in hundreds of dependencies and even when you audit distributed system written in C++/Java, it's a common case.
For example, Cloudflare's pingora has more than 400 dependencies. Other databases written in Rust, e.g., Databend and Materialize, have more than 1000 dependencies in the lockfile. TiKV has more than 700 dependencies.
People seem to jump in the debt of the number of dependencies or blame why you close the source code, ignoring the purpose that I'd like to show how you can organically contribute to the open-source ecosystem during your DAYJOB, and this is a way to write open-source code sustainable.
lots of crates by a cohesive group of authors: you "only" need to trust the group reviews each others work properly and they're not all compromised together (less likely).