It's not that JSON itself is bad, but it's obviously for machines to author, not for humans.
- No comment
- No trailing comma
- No multi-line string
It's a terrible format to type manually. However, we just shrugged and said "at least it's not XML" and started writing it manually anyway.
And later we finally realized comments are not optional, so we got JSON5, JSONC, etc...
15 years later and JSON still is still very far from the standardization and tooling that XML had at the time.
It reminds me of the NoSQL thing from back then too, oversimplified it was "what if we chuck JSON blobs in a key/value store?". It took years to realize that relational databases and SQL weren't actually that bad, and / or that NoSQL had a long term cost.
1: Standardization was made by committee lovers and/or architecture astronauts leaving us with overly convoluted (sometimes to fit lacklustre object models in early OO languages) and complex ways of working.
2: Complexity that introduced security vulnerabilities
Sure, there was some great tooling available, but how much of it was needed because all complexity made it hopeless to work with without the tooling?
I used it for configuration and serialization in some projects, and it was actually great but I almost always diverged from the bloated norms and defaults for readability/writeability (that made it a bit annoying to specify serialization rules).
I mean, why did people prefer?
<object><property name="somename"><int32>123</int32></property></object>
Over just? <object somename="123"/>
Yeah there is so much "flexibility" in those above designs but it wasn't needed 99% of the time.JSON was and is so far the best popular compromise between "just plain data" and "some" structure to make automated processing non-painful.
Also as an improvement over XML collections (do you created a container element to specify container target leading to bloat or just map some of the sub-elements to specific collections and hope you don't run into ambiguities?) is that collections are just specific lists to a property.
The biggest drawback of JSON is that we never had a way to handle type specializations/subtyping but had we done that we might have not gotten the universal acceptance across languages.
Yes, comments but whenever you need that you can make a single-line regexp to strip a useful subset of them without affecting anything by the standard by removing matches of
/^\s*\/\/[^\r\n]*/mg
Easier to write than JSON.
But weirdly harder to get right in other ways, especially for nesting things.
Almost made the mistake of becoming simple to parse.
Alas, defeat was again snatched from the jaws of victory on that one…
https://yaml.org/spec/1.2-old/spec.html#id2759572
However, I find the toml language somehow the better tech, while I still think that configuration should not be a "format" but a language.
I think at least the recomposition of values with other values should be possible:
CONFIG_PATH="$MAIN_PATH/config"
And why would you configure: DEV_PATH="dev"
PRD_PATH="prod"
when you could do MAIN_PATH=env.IS_PROD ? "prod" : "dev"
JSON is rediscovering XML Schema, XML DTDs etc, when we had those a quarter century ago already.
It was so good when you could define the structure easily and validate it with standard tooling.
If I have a structure like this JSON:
{
"foo": "bar",
"nested": {
"nested_foo": "nested_bar",
}
}
Should I do this in XML? <foo>
bar
</foo>
<nested>
<nested_foo>
nested_bar
</nested_foo>
</nested>
Or should I do this? <foo>
bar
</foo>
<nested nested_foo="nested_bar"/>
If your goal is to simply encode a data structure that looks more or less like a C-style struct (key/values, arrays, primitives), the concept of attributes on tags is superfluous, and introduces ambiguity in how to serialize something.To me, JSON is nice because it's essentially the minimum viable way of encoding a C-style struct (floating point behavior notwithstanding.) XML has extras like attributes, schemas, and DTDs, which may be useful, but they come at the cost of having additional syntax (<!DOCTYPE ...>, etc), which is auxiliary to the goal of encoding data structures, and thus makes it no longer as minimal.
To me, JSON's approach of having a separate out-of-band definitions of schema (e.g. JSON schema) is the better approach because it's less opinionated, you only pay for it when you need it, and it doesn't require separate syntax. Leave validation to a validation step, my data is my data.
Where you'd do attributes is like this:
<foo type="address">
bar
</foo>
<nested type="resident_list">
<nested_foo type="resident">
nested_bar
</nested_foo>
</nested>
A shitty example (you'd define types with schema), but you get the point.The data is in the tags, attributes are metadata, extra information that's needed during the process.
This is what JSON is lacking, you can't give metadata to data, like defining types. The type definition needs to live NEXT to the data in special keys you need to check every time when parsing.
I see some XML from other sources and it’s just awful. Everything that should be a nested tag is all jammed into an unreadable tag with attributes with no visual structure at all, spilling over the 80th column and word-wrapping in the terminal, it’s just like, what the hell. Why even use XML at that point?
I had been tasked at least several times in my past jobs to develop XSLT scripts to transform data into user-readable content. I don't know of anyone that uses XSLT today and I have no idea if there is a JSON equivalent.
Though as I understand it it's possible that this might not be the case for much longer: https://github.com/whatwg/html/issues/11523
I was building some kind of parser for them and got confused for a second when my Python script returned bare XML when fetching the front page :D
jq is the closest thing for json, but it’s far messier to write longer filters than XSLT, and that’s saying something!
But we haven't, otherwise we'd use all those better formats instead
Which is a good thing. This makes JSONL possible.
{
"uhh": "nope"
}
JSONL would just dictate that string values must not be in the multi-line form.except they very much are. the place to explain your payload is in the API documentation, not alongside the payload. It's not code.
I'm more partial to YAML for readability, but I don't think JSON configs are an awful anti pattern.
> It's not code.
package.json has a field literally called 'scripts' where the values are shell one-liners.
Not to be mean, but this has the trifecta of amateur programming:
- JSON - Games - One solution for everything.
Pro tip, you can store variables as they are in memory to disk. Got 1 million 2D points representing units? Each point is a couple of floats? You can store each float as-is, write the amount of floats as an int (4bytes), the first float is the X coord, the second the Y coord, (4 bytes each), then repeat 1M times, boom you just solved that in 8MB, and in a couple of miliseconds of compute. Bonus point, no escaping, no import json, just a programmer programming.
But you are being mean. Also, the guy who wrote the article is no amateur, but a seasoned veteran who's likely been storing floats in files for decades. Check him out, you might be surprised.
Programmers who had to do low level programming due to early hardware restrictions are hereby granted immunity from high-level elitism and gatekeeping, they can vibe code and npm install is-even if they so wish to.
Now I have 308 posts to read :)
I think that comes from separating content and style, indicating meaning rather than explicit style: it isn't really one asterisk for italic and two for bold, it is one for emphasis and two for strong emphasis and the renderer choses how to display those levels. Like using HTML's “em” and “strong” tags instead of explicit “i” and “b” tags.
It still looks good even without any formatting! (And btw. I thought that was the intention of markdown …)
Many interpreters will accept underscores for italic, though they still generally (but not always) require two asterisks for bold.
I just accept it as a general idea, not a standard, and lookup the local conventions for whatever tool I'm using at the time. Or if I'm writing a translator, I prioritise converting things written how I personally prefer to write plain text documentation.
Underline was pretty popular when I was younger. We'd underline the dates in our exercise books, it was used for emphasis in textbooks, &c. But when hyperlinks adopted underline as the default style then nobody would use them for anything else any more, as otherwise you'd have people clicking on things thinking they were links when they weren't. I'm guessing that's when people decided to render _this_ as italic rather than underline, perhaps web based IRC clients started it.
But then everyone stopped rendering links with underlines, so it died in vain. (I suppose it /is/ still used as a hover effect sometimes, inconsistently.)
And now nobody really uses underline for anything at all. It's kinda dormant waiting for its renaissance.
Italic can mean: emphasis, foreign word, word which is being defined for the first time, title of referenced work, mathematical variable, and many field-specific uses.
Bold can mean: strong emphasis, term that needs to stand out and be found easily while scanning, mathematical vector, and again, field-specific uses.
If a markup language only has tags for emphasis and strong emphasis, then you can't put bold or italics for any of the other reasons you might want to use them, so anyone wanting to do those things can only misuse the emphasis and strong-emphasis markup, so it de-facto starts to mean bold/italic anyway.
It's at least reasonable to propose a markup language where you have to say "this is emphasis, this is a foreign word, this is a title of a referenced work," etc. but not everybody is writing a document that needs that much metadata. At least HTML retained <i> and <b> when it introduced <em> and <strong>.
Styles, like words, can have several meanings, and forcing authors to separate them feels a bit like forcing them to write the word "set" differently for each of its 10+ meanings.
That is not as practical with markdown though, as you are working with a limited set of practical character combinations.
> Styles, like words, can have several meanings
This is one of the reasons why there are many markdown alternatives that behave slightly differently: not everyone writes plain text mark-up with the same intentions.
This isn't something you can solve with a single markdown version, so we have to accept that each could, and probably will, work a little differently.
{
"channel-comment": "This is the Slack channel that will receive notifications.",
"channel": "#abc"
}
<pre><code> var , var2 , var3 , var4 </code></pre>
Watch a movie? Add a page to Obsidian with the movie title as the note title, run a python script and boom it has all of the metadata filled up along with everything relevant from TMDB and it's a pretty card with a cover image on my Movies Base.
If Obsidian turns Evil Corporate, my workflow will still be the same, the editor just changes. I'll miss Bases, but all of my own automation is a bunch of external scripts that modify markdown.
No trailing commas is great for enforcing consistency. I’ma huge fan of consistency in code. Same with required quotation marks, which also simplify writing (imagine having to wonder if something needs it, or be surprised when it does and things break).
You've also got it backwards on quotes, it complicates writing by forcing you to write more. And with "Especially when automation exists" wondering is a non-issue, you'd get the syntax hint/error right there while typing and see if you need quotes before anything breaks
> Having to add a comma when appending to the list doesn't seem that bad to me as a trade off.
Then do that! No one is forcing you to add trailing commas, they're optional and for all the other people who don't think they add value
> Commas exist mostly to help JSON be human-readable. They have no real syntactic purpose as one could make their own notation without the commas and it'd work just fine.
https://stackoverflow.com/a/36104693
Elsewhere such commas can be optional, e.g. in clojure: https://guide.clojure.style/#opt-commas-in-map-literals
{ foo :"bar" baz :"bak" quux :[ a,b,c,d ] lol :9.7E+42 }
Ref: https://www.json.org/json-en.html, but without commas. It's line noise. Commas allow a nice visual anchor. my-abstr x y z foo: "bar" baz: "bak" "quak" quux: [a, b, c, d] lol: 9.7E+42
I don't think my-abstr x y z, foo: "bar", baz: "bak" "quak", quux: [a, b, c, d], lol: 9.7E+42
would be better. Indentation and/or coloring my-abstr and the labels (foo:, baz:, quux:, lol:) are the right measures here.Coloring things is a luxury, and from my understanding not many people understand that fact. When you work at the trenches you see a lot of files on a small viewport and without coloring most of the time. Being able to parse an squiggly file on a monochrome screen just by reading is a big plus for the file format in question.
As technology progresses we tend to forget where we came from and what are the fallbacks are, but we shouldn't. Not everyone is seeing the same file in a 30" 4K/8K screen with wide color gamut and HDR. Sometimes a 80x24 over some management port is all we have.
[(a b) (x y)]
instead of [a b, x y]
Personally, i like the second option better. { foo: "bar" baz: "bak" quux: [a b c d] lol: 9.7E+42 }
Thank you! I thought it was just me...
Surely _underline_ would make more sense than _italics_. Somewhere I have seen /italics/ in use, but that does look kind of regexpy.
>I dislike that trailing commas are not allowed...There is no need for this and it makes writing out valid JSON more complex.
Trailing commas as a trend emerged after JSON was standardised. And thank god JSON is as well and truly standard as it is.
The convenience it offers for diffing is just a manifestation of the positive interaction with grammars and language tools. The convention of humans using trailing commas in lists, along with one item per line, is relatively new, though. Stylistically, this used to be frowned upon as long definition lists made source files longer, slower to scroll through, and worsened code locality from the perspective of someone using, e.g., a 25 row terminal.
I prefer leading commas to having a final comma with an empty clause, though some people hate that and they don't really solve all the final-entry issues (they address some of them, but others are just moved to being first-entry issues).
I've been recently using dyff [0] to diff YAMLs in an order invariant way, and it's been absolutely liberating. Couldn't help with version control, but it's still night and day.
[ "or"
, "alternatively"
, [ "the"
, "first"
, "item"
]
, "could"
, "behave"
, "differently"
]
I feel that, despite its repugnant appearance, this "comma-first" approach is the best tradeoff in languages like JSON where trailing commas are forbidden; the leading `[` is much harder to accidentally omit or insert than the subtle trailing `,`. In Emacs I use js3-mode, a hack of js2-mode to support comma-first syntax.Comma-first syntax is especially convenient in SQL, which has the forbidden-trailing-comma problem and several analogous problems. In C if I have a long Boolean conjunction
if (unpleasantly long boolean expression &&
another unpleasantly long boolean expression &&
yet another unpleasantly long boolean expression) {
there are several ways to fix it, such as nesting ifs or factoring the expressions into variables or functions. The "comma-first" approach is also visually unappealing for spacing reasons, requiring two extra spaces after the parenthesis: if ( unpleasantly long boolean expression
&& another unpleasantly long boolean expression
&& yet another unpleasantly long boolean expression
) {
In SQL, C's alternative approaches are not available, and the "comma-first" style is much more natural: where unpleasantly long boolean expression
and another unpleasantly long boolean expression
and yet another unpleasantly long boolean expression
I do agree, though, that it's better to design languages to avoid this problem, and I think the way to do that is by using item terminators or item initiators in a list rather than by using item separators. That's what C did for statements with `;`, which was a difference from the ALGOL tradition including Pascal, where `;` was a statement separator, with the unpleasant consequences described in https://www.cs.virginia.edu/~evans/cs655/readings/bwk-on-pas....In Meta5ix http://www.canonical.org/~kragen/sw/dev3/meta5ixrun.py I experimented with using item initiators for rules in a grammar, like Markdown uses for bulleted lists. I'm not pleased with the rest of the syntactic decisions I tried in Meta5ix, but I do think that one was a good tradeoff; here's about a quarter of the Meta5ix compiler:
- terms: term ["," {continue $choice} term] @choice
- term: (factor {else $seq}, output) [factor {assert}, output] @seq
- factor: string {literal $it}
, "(" terms ")"
, "[" @many terms {continue $many} "]"
Note that, while comma-first layout feels like a gross abuse of a punctuation mark with `,`, it's quite common and natural with `|` in grammars and pattern-matches in languages like ML, where an initial `|` is also permitted; here's an excerpt from my port of μKanren to OCaml (http://canonical.org/~kragen/sw/dev3/mukanren.ml): let rec walk (s : env) = function
| Vart (Var x) when Env.mem x s -> walk s (Env.find x s)
| u -> u
I think that's what I should have used in Meta5ix, and I will if I get around to revising it.FWIW with SQL multi-line booleans, I tend to do:
WHERE TRUE
AND something
AND something_else
Because that's _underline_, and /italics/ are slanted
> I also have issue with it’s creator
Just pick a different specification of markdown, it's not like there is only one :)
That imposter markdown came out of nowhere