This deeply misunderstands the philosophy of Protobuf. proto3 doesn't even support required fields. https://protobuf.dev/best-practices/dos-donts/
> Never add a required field, instead add `// required` to document the API contract. Required fields are considered harmful by so many they were removed from proto3 completely.
Protobuf clients need to be written defensively, just like JSON API clients.
Most of my APIs are internal APIs that accept breaking changes easily. My experience with protobufs is that it was created to solve problems in large systems with many teams and APIs, where backwards compatibility is important. There are certainly systems where you can't "just" push through a breaking API change, and in those cases protobufs make sense.
138 million downloads from npm in the last week. Yes, you can validate your JSON
Unless your servers and clients push at different time, thus are compiled with different versions of your specs, then many safety bets are off.
There are ways to be mostly safe (never reuse IDs, use unknown-field-friendly copying methods, etc.), but distributed systems are distributed systems, and protobuf isn't a silver bullet that can solve all problems on author's list.
On the upside, it seems like protobuf3 fixed a lot of stuff I used to hate about protobuf2. Issues like:
> if the field is not a message, it has two states:
> - ...
> - the field is set to the default (zero) value. It will not be serialized to the wire. In fact, you cannot determine whether the default (zero) value was set or parsed from the wire or not provided at all
are now gone if you stick to using protobuf3 + `message` keyword. That's really cool.
this seems like a problem only if you use JSON.parse or json.loads etc. and then just cross your fingers and hope that the types are correct, basically doing the silent equivalent of casting an "any" type to some structure that you assume is correct, rather than strictly parsing (parse, don't validate) into a typed structure before handing that off to other code.
That's called validating? Zod is a validation library.
But yeah, people really need to start strictly parsing/validating their data. One time I had an interview and I was told yOu DoN'T tRuSt YoUr BaCkeNd?!?!?!?
looking at zod (assuming https://zod.dev) it is a parsing library by that definition — which isn't, like, an official definition or anything, one person on the internet came up with it, but I think it is good at getting the principle across
under these definitions a "parser" takes some input and returns either some valid output (generally a more specific type, like String -> URL) or an error, whereas a "validator" just takes that input and returns a boolean or throws an error or whatever makes sense in the language.
eta: probably part of the distinction here is that since zod is a JS library the actual implementation can be a "validator" and then the original parsed JSON input can just be returned with a different type. "parse don't validate" is (IMO) more popular in languages like Rust where you would already need to parse the JSON to a language-native structure from the original bytes, or to some "JSON" type like https://docs.rs/serde_json/latest/serde_json/enum.Value.html that are generally awkward for application code (nudging you onto the happy parsing path).
Sure it will blow up in your face when a field goes missing or value changes type.
People who advocate paying the higher cost ahead of time to perfectly type the entire data structure AND propose a process to do perform version updates to sync client/server are going to lose most of the time.
The zero cost of starting with JSON is too compelling even if it has a higher total cost due to production bugs later on.
When judging which alternative will succeed, lower perceived human cost beats lower machine cost every time.
This is why JSON is never going away, until it gets replaced with something with even lower human communication cost.
Yup this is it. No architect considers using protos unless there is an explicit need for it. And the explicit need is most times using gRPC.
Unless the alternative allows for zero cost startup and debugging by just doing `console.log()`, they won't replace JSON any time soon.
Edit: Just for context, I'm not the author. I found the article interesting and wanted to share.
And since you need to translate it anyway, there's not much benefit in my mind to using something like msgpack which is more compact and self describing, you just need a decoder to convert to json when you display it.
I once converted a fairly large JS codebase to TS and I found about 200 mismatching names/properties all over the place. Tons of properties we had nulls suddenly started getting values.
that's true. But people also rather argue about security vulnerabilities than getting it right from the get-go. Why spend an extra 15 mins effort during design when you can spend 3 months revisiting the ensuing problem later.
It appears possible in some cases but it's not universally the case. Which means that similar binary transport formats that do support zero-copy, like Cap'n Proto, offer most or all of the perks described in this post, with the addition of ensuring that serialization and deserialization are not a bottleneck when passing data between processes.
"Cap’n Proto is INFINITY TIMES faster than Protocol Buffers. (...) there is no encoding/decoding step. The Cap’n Proto encoding is appropriate both as a data interchange format and an in-memory representation, so once your structure is built, you can simply write the bytes straight out".
I take it as a rationalization of what OLE Compound File Binary - internal Microsoft Office memory structures serialized "raw" as file format - would look like if they paid more attention to being backward and forward compatible and extensible.
Protobuf has advantages, but is missing support for a tons of use cases where JSON thrives due to the strict schema requirement.
A much stronger argument could be made for CBOR as a replacement for JSON for most use cases. CBOR has the same schema flexibility as JSON but has a more concise encoding.
But yes, a native implementation would save me the trouble!
[ciborium]: a Rust CBOR library; https://docs.rs/ciborium/latest/ciborium/
https://en.wikipedia.org/wiki/ASN.1#Example_encoded_in_DER
Protobuf is ok but if you actually look at how the serializers work, it's just too complex for what it achieves.
- You can support all those features, and your ASM.1 library will be horribly bloated and over-engineered.
- You can support your favorite subset, but then you cannot say it's ASN.1 anymore. It will be "ASN.brabel", which only has one implementation (yours). And who wants that?
(unless you are Google and have immense developer influence... But in this case, why not design things from scratch, since we are making all-new protocol anyway?)
(I wrote a implementation of DER encoding/decoding in C, which is public domain and FOSS.)
Also I don't see special ASN1 support for non-Unicode string encodings, only subsets of Unicode like ascii or printable ascii. It's a big can of worms once you bring in things like Latin-1.
Also, DER allows to indicate the type of data within the file (unless you are using implicit types). Protobuf has only a limited case of this (you cannot always identify the types), and it requires different framing for different types. However, DER uses the same framing for all types, and strings are not inherently limited to 2GB by the file format.
Furthermore, there are other non-scalar types as well.
In any of these cases, you do not have to use all of the types (nor do you need to implement all of the types); you only need to use the types that are applicable for your use.
I will continue to use ASN.1; Protobuf is not good enough in my opinion.
Yeah. I do remember a lot of workloads at Google where most of the CPU time was spent serializing/deserializing protos.
Not much is faster than protobuf except for zero copy formats.
After working heavily with SNMP across a wide variety of OEMs, this flexibility becomes a downside. Or SNMP/MIBs were specified at the wrong abstraction level, where the ASN.1 flexibility gives mfgs too much power to do insane and unconventional things.
Technically, it sounds really good but the actual act of managing it is hell. That or I need a lot of practice to use them, at that point shouldn't I just use JSON and get on with my life.
Whether the team saves times in the longer when using protos is a question in its own.
Making changes to messages in a backwards-compatible way can be annoying, but JSON allowing you to shoot yourself in the foot will take more time and effort to fix when it's corrupting data in prod than protobuf giving you a compile error would.
If they live in their own project, making a single project be buildable with a git clone gets progressively more complex.
You now need sub modules to pull in your protobuf definitions.
You now also need the protobuf tool chain to be available in your environment you just cloned to. If that environment has the wrong version the build fails, it starts to get frustrating pretty fast.
Compare that to json, yes I don't get versioning and a bunch of other fancy features but... I get to finish my work, build and test pretty quickly.
If that is just your team, use whatever tech gets you there quick.
However, if you need to provide some guarantees to a second or third party with your API, embrace standards like JSON, even better, use content negotiation.
So my question is, why didn't Google just provide that as a library? The setup wasn't hard but wasn't trivial either, and had several "wrong" ways to set up the proto side. They also bait most people with gRPC, which is its own separate annoying thing that requires HTTP/2, which even Google's own cloud products don't support well (e.g App Engine).
P.S. Text proto is also the best static config language. More readable than JSON, less error-prone than YAML, more structure than both.
the twirp spec however is so simple you can throw together your own code generator in an afternoon for whatever language you want.
The app 20req/sec
The app after optimizations: 20req/sec (It waits for db query anyway)
Otherwise, json is sufficient.
Plus - tooling.
JSON might not be simpler or strict but it gets the job done.
If JSON's size or performance is causing you to go out of business, you surely have bigger problems than JSON.
As an aside, like all things Google, their C++ library is massive (14mb dll) and painful to build (takes nearly 10 minutes on my laptop).
>If you develop or use an API, there’s a 99% chance it exchanges data encoded in JSON.
Just wondering if the inherent defiencies of JSON can somewhat be improved by CUE lang since the former is very much pervasive and the latter understand JSON [1],[2].
[1] Configure Unify Execute (CUE): Validate, define, and use dynamic and text‑based data:
[2] Cue – A language for defining, generating, and validating data:
Batching with message pooling to a transaction payload size limit actually made it performant. =3
We might disagree on what "efficient" means. OP is focusing on computer efficiency, where as you'll see, I tend to optimize for human efficiency (and, let's be clear, JSON is efficient _enough_ for 99% of computer cases).
I think the "human readable" part is often an overlooked pro by hardcore protobuf fans. One of my fundamental philosophies of engineering historically has been "clarity over cleverness." Perhaps the corollary to this is "...and simplicity over complexity." And I think protobuf, generally speaking, falls in the cleverness part, and certainly into the complexity part (with regards to dependencies).
JSON, on the other hand, is ubiquitous, human readable (clear), and simple (little-to-no dependencies).
I've found in my career that there's tremendous value in not needing to execute code to see what a payload contains. I've seen a lot of engineers (including myself, once upon a time!) take shortcuts like using bitwise values and protobufs and things like that to make things faster or to be clever or whatever. And then I've seen those same engineers, or perhaps their successors, find great difficulty in navigating years-old protobufs, when a JSON payload is immediately clear and understandable to any human, technical or not, upon a glance.
I write MUDs for fun, and one of the things that older MUD codebases do is that they use bit flags to compress a lot of information into a tiny integer. To know what conditions a player has (hunger, thirst, cursed, etc), you do some bit manipulation and you wind up with something like 31 that represents the player being thirsty (1), hungry (2), cursed (4), with haste (8), and with shield (16). Which is great, if you're optimizing for integer compression, but it's really bad when you want a human to look at it. You have to do a bunch of math to sort of de-compress that integer into something meaningful for humans.
Similarly with protobuf, I find that it usually optimizes for the wrong thing. To be clear, one of my other fundamental philosophies about engineering is that performance is king and that you should try to make things fast, but there are certainly diminishing returns, especially in codebases where humans interact frequently with the data. Protobufs make things fast at a cost, and that cost is typically clarity and human readability. Versioning also creates more friction. I've seen teams spend an inordinate amount of effort trying to ensure that both the producer and consumer are using the same versions.
This is not to say that protobufs are useless. It's great for enforcing API contracts at the code level, and it provides those speed improvements OP mentions. There are certain high-throughput use-cases where this complexity and relative opaqueness is not only an acceptable trade off, but the right one to make. But I've found that it's not particularly common, and people reaching for protobufs are often optimizing for the wrong things. Again, clarity over cleverness and simplicity over complexity.
I know one of the arguments is "it's better for situations where you control both sides," but if you're in any kind of team with more than a couple of engineers, this stops being true. Even if your internal API is controlled by "us," that "us" can sometimes span 100+ engineers, and you might as well consider it a public API.
I'm not a protobuf hater, I just think that the vast majority of engineers would go through their careers without ever touching protobufs, never miss it, never need it, and never find themselves where eking out that extra performance is truly worth the hassle.
Also the “us” is ever-changing in a large enough system. There are always people joining and leaving the team. Always, many people are approximately new, and JSON lets them discover more easily.
In terms of human effort, a strongly typed schema rather than one where you have to sanity check everything saves far more time in the long run.
There is a really interesting discussion underneath of this as to the limitations of JSON along with potential alternatives, but I can't help but distrust this writing due to how much it sounds like an LLM.
Seems like the author just wanted to talk about Protobuf without bothering too much about the issues with JSON (though some are mentioned).
I didn't find it confusing.
I found it unconvincing, but the argument itself was pretty clear. I just disagreed with it.
I promise you cannot tell LLM-generated content from non-LLM generated content. what you think you’re detecting is poor quality, which is orthogonal to the tooling used
I am not dismissing this as being slop and actually have no beef with using LLMs to write but yes, as you call out, I think it's just poorly written or perhaps I'm not the specific audience for this.
Sorry if this is bad energy, I appreciate the write up regardless.
I was pushing at one point for us to have some code in our protobuf parsers that would essentially allow reading messages in either JSON or binary format, though to be fair there's some overhead that way by doing some kind of try/catch, but, for some use cases I think it's worth it...
https://en.wikipedia.org/wiki/DCE/RPC
DCE/RPC worked in 1993, and still does today.
Protocol buffers is just another IDL.
Now, there is a serde_protobuf (I haven't used it) that I assume allows you to enforce nullability constraints but one of the article's points is that you can use the generated code directly and:
> No manual validation. No JSON parsing. No risk of type errors.
But this is not true—nullability errors are type errors. Manual validation is still required (except you should parse, not validate) to make sure that all of the fields your app expects are there in the response. And "manual" validation (again, parse don't validate) is not necessary with any good JSON parsing library, the library handles it.