https://github.com/centralci/go-benchmarks/tree/b647c45272c7...
So it seems both are operating at the edge of Go's capabilities.
Personally, I think JSON should be in Go's core and highly optimised simd c code and not in the Go's std library as standard Go code. As JSON is such an important part of the web nowadays, it deserves to be treated with more care.
Edit: See: https://go.dev/wiki/AssemblyPolicy
There is nothing special about C, other that its historical availability after UNIX's free beer came to be.
Any combination of high level + Assembly is enough.
I think when it's introduced it might be worth discussing that again. Otherwise providing assembly for JSON of all packages seems like a huge maintenance burden for very little benefit for end users (since faster alternatives are readily available)
Sonic may be different but I'm feeling once bitten twice shy on "faster" JSON parsers at this point. A highly optimised SIMD version might be nice but the stdlib json package needs to work for everything out there, not just the cases the author decided to test on, and I'd be a lot more nervous about something like that being sufficiently well tested given the extra complexity.
There is a case to be made here but Corba, SOAP and XML-RPC likely looked similarly sticky and eternal in the past
We had no plans to change to something else.
By late 00s, even the talk was more along the lines of it being legacy tech.
Eventually migrated to Java EE, also taking advantage of CORBA compatibility.
If you're pushing data around on disk where the serialization library is your bottleneck, pick a better format.
But in that case your last point still stands: pick a better format
Yes, if you are looking at a single request-response interaction over the Internet in isolation and observing against wall clock time, the time spent on JSON (de-)serialization (unless egregiously atrocious) will usually be insignificant.
But that's just one perspective. If we look at CPU time instead of wall clock time, the JSON may dominate over the network calls. Moreover, in a language like Go, which can easily handle tens to hundreds of thousands of parked green threads waiting for network activity, the time spent on JSON can actually be a significant factor in request throughput. Even "just" doubling RPS from 10k to 20k would mean using half as much energy (or half as much cloud compute spend etc.) per request.
Changing formats (esp to a low-overhead binary one) might yield better performance still, but it will also have costs, both in time spent making the change (which could take months) and adapting to it (new tools, new log formats, new training, etc.).
second of all, sonic apparently uses unsafe to (unsafe-ly) cast byte slices to strings, which of course is gonna be faster than doing things correctly, but is also of course incomparable to doing things correctly
like almost all benchmark data posted to hn -- unsound, ignore
I just ran our full suite of a few thousand unit tests with GOEXPERIMENT=jsonv2 and they all passed. (well, one test failed because an error message was changed, but that's on us)
I'm especially a fan of breaking out the syntactic part into into its own jsontext package. It makes a ton of sense, and I could see us implementing a couple parsers on top of that to get better performance where it really matters.
I wish they would take this chance to ditch omitempty in favor of just the newly-added omitzero (which is customizable with IsZero()), to which we'll be switching all our code over Real Soon Now. The two tags are so similar that it takes effort to decide between them.
- "omitempty" will omit an object field after encoding it, according to its JSON value
- "omitzero" will omit an object field before encoding it, according to its Go value
The former is particularly useful when you are dealing with foreign types that don't implement IsZero (yet) or implement it in an inappropriate way for how you're using it. You could, of course, write a wrapper type, but even when you can use struct embedding to make the wrapper less painful, you still have to duplicate all of the constructors/factories for that type, and you have to write the tedious code to do the conversions somewhere.
No, it's an exception. It was badly designed from the start - it's not just that people's json needs (which hardly changed) outgrew it.
Over time, it became evident that the JSON package didn't meet the needs of its users, and the package has evolved as a result. The size of the evolution doesn't matter.
The latter is a weasely way to put the blame on changing needs - as if initially it was fine, but user needs grew and it's not covering them anymore. Truth is, user needs are the same, we havent had any magical change in JSON use patterns over the last 10 years. The design was just flawed to begin with.
I'd argue it didn't "become evident over time" either. It was evident on day one, and many people pointed it out 10 and 13 years ago.
It's also true that a json IO built-in lib typically wouldn't be so poorly designed in the first release of a language, that it would immediately be in need of maintenance.
JSON library released with Go 1, in 2012. This makes the library 13 years old [0].
If that's immediate, I'm fine with that kind of immediate.
Obviously encoding/json doesn't "work perfectly", the TFA lists several problems it has, and the need for a new version, and that's directly by the horse's mouth. Is the Go team "trolling" as well?
Second, we're not talking whether it "does the job", which is what you might mean by "works perfectly great".
We're talking about whether it's a good design for the problem domain, or whether it has footguns, bad API choices, performance problems, and other such issues.
That does look a lot cleaner. I was just grumbling about this in golang yesterday (yaml, not json, but effectively the same problem).
The lack of tagged unions of some sort in go makes things like polymorphic json difficult to handle. It's possible, but requires a ton of overhead. Rust with enums mixed with serde makes this trivial. Insanely trivial.
How did that make it into the v1 design?
var slice []string
fmt.Println(slice == nil)
slice := []string{}
fmt.Println(slice == nil)
The language just has a bad habit of confusing them some of the time, but not consistently, so you can still occasionally get bit by the difference
As someone who uses Go a lot, it's just one of those things...
var m map[string]int
println(m == nil)
There’s a simple reason: most JavaScript parsers reject null. At least in the slice case.
It's more that when building an API that adheres to a specification, whether formal or informal, if the field is supposed to be a JSON array then it should be a JSON array. Not _sometimes_ a JSON array and _sometimes_ null, but always an array. That way clients consuming the JSON output can write code consuming that array without needing to be overly defensive
Also, now that nil map is an empty object, shouldn't that extend to every nil struct that doesn't have a custom marshaller? It would be an object if it wasn't nil after all...
var m map[string]int = nil fmt.Println(m["foo"])
The language spec is also pretty clear on this; https://go.dev/ref/spec#Map_types:
> A nil map is equivalent to an empty map except that no elements may be added.
The types themselves have a way to customize their own JSON conversion code. You could have a struct serialize itself to a string, an array, do weird gymnastics, whatever. The JSON module calls these custom implementations when available.
The current way of doing it is shit though. If you want to customize serialization, you need to return a json string basically. Then the serializer has to check if you actually managed to return something sane. You also have no idea if there were some JSON options. Maybe there is an indentation setting or whatever. No, you return a byte array.
Deserialization is also shit because a) again, no options. b) the parser has to send you a byte array to parse. Hey, I have this JSON string, parse it. If that JSON string is 100MB long, too bad, it has to be read completely and allocated again for you to work on because you can only accept a byte array to parse.
New API fixes these. They provide a Decoder or Encoder to you. These carry any options from top. And they also can stream data. So you can serialize your 10GB array value by value while the underlying writer writes it into disk for example. Instead of allocating all on memory first, as the older API forces you to.
There are other improvements too but the post mainly focuses on these so thats what I got from it (I havent tried the new api btw, this is all from the post so maybe I’m wrong on some points)
I was not sure whether this was the case, as `json.NewEncoder(io.Writer)` and `json.NewDecoder(io.Reader)` exist in v1, so I had checked, and guess what, you're right! Decode() actually reads the value to internal buffer before doing any marshalling in the first place. I had always assumed that it kept internal stack of some kind, for matching-parenthesis and type safety stuff within streaming context, but no, it doesn't do any of that stuff! Come think of it: it does make sense, as partial-unmarshal would be potentially devastating for incrementally-updated data structures as it would leave them to inconsistent state.
The largest problem were around behavior around nil in golang and what to convert into json and vice versa.
* The v2 will now throw an error for invalid characters outside of ut8 (before silently accepted it) which meant one had to preprocess or process again the json before sending it off to the server * the golang nil will be converted to json empty array or map (for each type). previously it was converted to json null. * json field names will be converted to golang names with case sensitivity. before it was case-insentitive and would be lowercased. this kinda caused lots of problems if the field collided. (say there's bankName and bankname in json) * omitempty was problematic as it was used for say golang amount: nil would mean omit the field in json as {} instead of { amount: null}. however it also meant that the golang amount: 0 would also be omitted as { amount: 0 } which surprising. the new omitempty will only do so for nil and empty arrays/hashmaps but no longer for 0 or false. there's a new omitzero tag for that.
Now here’s a new implantation that addresses some of the architectural problems that made the old library structurally problematic for some use cases (streaming large JSON docs being the main one).
New one solves some niche problems I think were probably just best left to third party libraries,
I have two complaints. Its decoder is a little slow, PHP's decoder blows it out of the water. I also wish there was an easy "catch all" map you could add to a struct for items you didn't define but were passed.
None of the other things it "solves" have ever been a problem for me - and the "solution" here is a drastically more complicated API.
I frankly feel like doing a v2 is silly. Most of the things people want could be resolved with struct tags varying the behavior of the existing system while maintaining backwards compatibility.
My thoughts are basically as follows
The struct/slice merge issue? I don't think you should be decoding into a dirty struct or slice to begin with. Just declare it unsupported, undefined behavior and move on.
Durations as strings? Why? That's just gross.
Case sensitivity by default? Meh. Just add a case sensitivity struct tag. Easy to fix in v1
Partial decoding? This seems so niche it should just be a third party libraries job.
Basically everything could've been done in a backwards compatible way. I feel like Rob Pike would not be a fan of this at all, and it feels very un-Go.
It goes against Go's whole worse is better angle.
switch ifc := val.(type) {
case MarshalJSONV2: // new interface, always used if present
case MarshalJSON: // old interface, fallback
default: // type isn't self-marshaling
}
The jsontext package is what's really revolutionary and needed here. The poor performance of the existing API is due primarily to the lack of a proper streaming API as the foundation. Using this as the basis for the new interfaces makes sense, and I agree that once this exists, the need for an entirely separate v2 package largely vanishes.Also https://antonz.org/go-json-v2/#marshalwrite-and-unmarshalrea... not completely sure but maybe combining
dec := json.NewDecoder(in)
dec.Decode(&bob)
to just
json.UnmarshalRead(in, &bob)
is nicer...mostly the performance benefit though
> It goes against Go's whole worse is better angle
One could almost say that durations as strings is...worse.
It is good to see some partial solutions to this issue. It plagues most languages and introduces a nice little ambiguity that is just trouble waiting to happen.
Ironically, JavaScript with its hilarious `null` and `undefined` does not have this problem.
Most JSON parsers and emitters in most languages should use a special value for "JSON null".
obj['key']=undefined
was the same as
delete obj['key']
And then people complain that Rust doesn't have a batteries-included stdlib. It is done to avoid cases like this.
Both v1 packages continue work; both are maintained. They get security updates, and were both improved by implementing them on top of v2 to the extent possible without breaking their respective APIs.
More importantly: the Go authors remain responsible for both the v1 and v2 packages.
What most people want to avoid with a "batteries included standard library" (and few additional dependencies) is the debacle we had just today with NPM.
Well maintained packages, from a handful of reputable sources, with predictable release schedules, a responsive security team and well specified security process.
You can't get that with 100s of independently developed dependencies.
Yes, we should definitely go with the Rust approach instead.
Anyway, I'd better get back to figuring out which crate am I meant to be using...
It's still better than the mess that is Node.js.
"In v1, a nil Go slice or Go map is marshaled as a JSON null. In contrast, v2 marshals a nil Go slice or Go map as an empty JSON array or JSON object, respectively. The jsonv2.FormatNilSliceAsNull and jsonv2.FormatNilMapAsNull options control this behavior difference. To explicitly specify a Go struct field to use a particular representation for nil, either the `format:emitempty` or `format:emitnull` field option can be specified. Field-specified options take precedence over caller-specified options."
For example, if my intent is to keep the present value, I will send {"foo": 1} or {"foo": 1, "bar": null} as null and no value has the same meaning. On the other hand, I might want to change the existing value to empty one and send {"foo": 1, "bar": []}.
The server must understand case when I am not mutating the field and when I am mutating the field and setting it to be empty.
On the other side, I never want to be sending json out with null values as that is waste of traffic and provides no information to the client, ie {"foo": 1, "bar": null} is the same as {"foo": 1}.
Protocol buffers have the exact same problem but they tackle it in even dumber way by requiring you to list fields from the request, in the request's special field, which you are mutating as they are unable to distinguish null and no value present and will default to empty value otherwise, like {} or [], which is not the intent of the sender and causes all sort of data corruption.
PS: obviosly this applies to pointers as a whole, so if i have type Request struct { Number *int `json:"number"} then sending {} and {"number": null} must behave the same and result in Result{Number nil}
https://pkg.go.dev/github.com/go-json-experiment/json#exampl...
Given that it is not even yet solved in its namesake language, Javascript, that's not saying much.