CONL: "Markdown" for your config files(cirw.in)

70 pointsby kretaceous2 months ago18 comments

nickm122 months ago
> it’s really hard to comment out a line in a JSON file, because you end up with an extra trailing , on the previous line
Every other language has figured this one out: just support trailing commas. JSON5 supports comments and trailing commas.
https://devblogs.microsoft.com/oldnewthing/20240209-00/?p=10... https://json5.org/
> The first version of CONL used # as a comment token, but I quickly ran into issues. URLs contain #, so my next version...
Every other language has figured this one out as well. Wrap strings in quotation marks.
> That led to a data-model where each value is one of scalar|list|map (Compared to JSON’s null|bool|number|string|object|array, this felt good).
I'm not sure what a "scalar" is in CONL (is it always a string?) but a config file format having fewer types than JSON does not feel good to me. Even JSON's hand-wavy "number" type is problematic (whether "1" is an integer or float or some some other type is implementation-defined). TOML got it right to distinguish integers from floats. TOML got this right.
- jcelerier2 months ago
  > Wrap strings in quotation marks.
  No one wants that in a config file
0xbadcafebee2 months ago
Those who don't learn their history are doomed to find new and innovative ways to repeat history.
If you're older than 40, you remember that there did exist an aeon, long, long ago, when people did not use data object serialization formats as config files. When config files were written not to be easy to parse, but to make it easier for human beings to configure software. When nobody believed there was one single good way to do everything. When software was written not to aid computers, but to aid humans.
- bonzini2 months ago
  > When config files were written not to be easy to parse, but to make it easier for human beings to configure software
  Config files have always been a variant of key-value or section-key-value, except that we used to have ad hoc (and probably buggy, inconsistent, incomplete or all three) rules for quoting; array items separated by a mix of spaces, commas or something else; comments (semicolon, percent, sharp) different for each program. Case sensitivity was also a crap shoot, sometimes different between keys and values.
  These days TOML (which more or less just works) just works. I have mixed feelings about YAML but certainly I would not swap it with endless variants of sendmail's m4 madness.
  - 0xbadcafebee2 months ago
    There's a universe of config files out there that are not key-value. Most exist for specific applications. It can be hard to configure specific functionality, so developers gave users a particular way to express it.
    Again with the TOML vs YAML? Ya'll can't come up with anything but another version of the same old thing? You don't need to do the same thing everyone else does with a tiny twist. Think outside the box. Expand your mind!
    Sure, Sendmail/M4 was a pain in the ass. Postfix was more along the lines of key-value. But Exim had its own rule format, and Qmail took simplicity to the extreme by creating a different file for all the different options.
    How about an X11 config file? Nginx? Puppet? Bind? Fstab? Vimrc? Rsyslog? Netrc? Cups? Pppd? Iptables? Apt? Cron? Sysctl? SSH? Just name a program on Linux that wasn't created in the last 15 years and it will have a different config file format, tailored to the users and use cases of that application. And none of them are JSON, YAML, or TOML.
    You don't have to make yours completely unique, but you also don't have to go "oh well, there's only 3 formats to choose from, I guess I will have to settle for one of those". DO YOUR OWN THING! It's your program! Don't be a slave to convention!
    bonzini2 months ago
    > How about an X11 config file? Nginx? Puppet? Bind? Fstab? Vimrc? Rsyslog? Netrc? Cups? Pppd? Iptables? Apt? Cron? Sysctl? SSH?
    X11, ssh, Cups are 100% section-key-value or key-value and could be served by TOML easily.
    Some of these are just programming languages (vimrc, udev, nginx) or shell scripts (iptables) in disguise. By all means keep those.
    Some are tables (cron, fstab, apt, netrc, syslogd are the ones I recognize). I suppose that's a third category but in the end they're also section-key-value (see systemd timer and mount units) and the bespoke format for the user is just one possible tradeoff between readability and conciseness. A lot of the formats you mentioned do have quoting issues, that would go away with a standardized configuration format.
    0xbadcafebee2 months ago
    You're missing the point, man. Can you replace these configs with TOML, or JSON, or YAML? Sure. With enough key-values you could replace anything.
    But what's the user experience like? Those generic formats are not designed for a great user experience, they're designed to be generic. So they end up being at best mildly irritating, and at worst wildly frustrating.
    Here's an SSH config:
    Include ~/.ssh/my-org/* Host *.co.uk ProxyCommand ssh bastion@my-uk-server.co.uk nc %h %p 2> /dev/null Host newServer HostName newServer.url User adminuser Port 2222 IdentityFile ~/.ssh/id_rsa.key Host anotherServer.tld HostName anotherServer.url User mary Port 2222
    Now write that as TOML:
    [global] include-files = ~/.ssh/my-org/* [host.match-co-uk] host-match = *.co.uk proxy-command = ssh bastion@my-uk-server.co.uk nc %h %p 2> /dev/null [host.match-new-server] hostname-match = newServer hostname = newServer.url user = adminuser port = 2222 identityfile = ~/.ssh/id_rsa.key [host.match-another-server] host-match = anotherServer.tld hostname = anotherServer.url user = mary port = 2222
    The SSH config can be read easily, written easily, is easy to understand, and the functionality and format are tied together so you can do more complex things easier. On top of that, the SSH file can be changed around to load includes before or after other lines, to change how they match.
    The TOML one not only takes longer to write, but it lacks the kind of functionality that the SSH config has to both declare a new block, define its internal name, and specify a config glob, all with the same string. And you can't change how or when includes are loaded or what they overload without adding some kind of "priority" key-value, and then having to read each entry, do some math, change all the values to load different things at different places. (and looking back, I actually screwed up the TOML config, because it was so confusing!)
    Don't choose a generic solution if it's going to give the user a pain in the ass. If you don't care about the user, then you're part of the enshittification of technology.
    networked2 months ago
    > but [TOML] lacks the kind of functionality that the SSH config has to both declare a new block, define its internal name, and specify a config glob, all with the same string.
    This isn't true. TOML allows table names with periods and asterisks. It also supports top-level keys. This is what the TOML may look like for your SSH config:
    include = "~/.ssh/my-org/*" # We write `"*.co.uk"` rather than `host."*.co.uk"`. # `Host` in ssh_config(5) introduces sections, # and TOML has table headers for sections. # A more complete design would have something for `Match`. ["*.co.uk"] proxy-command = "ssh bastion@my-uk-server.co.uk nc %h %p 2> /dev/null" [newServer] hostname = "newServer.url" user = "adminuser" port = 2222 identity-file = "~/.ssh/id_rsa.key" ["anotherServer.tld"] hostname = "anotherServer.url" user = "mary" port = 2222
    (Personally, I am not a fan of indenting TOML.)
    > And you can't change how or when includes are loaded or what they overload without adding some kind of "priority" key-value, and then having to read each entry, do some math, change all the values to load different things at different places. (and looking back, I actually screwed up the TOML config, because it was so confusing!)
    Right, your TOML is invalid. TOML requires you to quote strings. You might have it mixed up with another INI-derived format.
    It could be because TOML is inherently more confusing than ssh_config(5), but I doubt it is actually more confusing to a newcomer. For example, a newcomer might think that indentation in SSH config is semantic when it isn't. I know because I once made this mistake. A newcomer must also remember that `Host` and `Match` are special and introduce new sections despite looking like other declarations. What is more likely is that you learned SSH config by studying the man page or reading a book like SSH Mastery and forgot the effort it took, and now you have "the curse of knowledge" about it (https://en.wikipedia.org/wiki/Curse_of_knowledge), but you haven't studied the TOML spec the same way.
    A numeric "priority" key would indeed make for a pretty miserable user experience. Don't implement one if you can help it. There are different, better ways to express how an include should only affect certain keys.
    The way I would do it in a TOML-based SSH config file is probably with dotted keys for tables. (I wouldn't necessarily choose TOML for this task, but TOML is your example.) For instance:
    [my-org] include = "~/.ssh/my-org/*" [my-org."*.example.com"] # Host inherits from `my-org`. ["*.example.net"] # Host doesn't inherit from `my-org`.
    > Don't choose a generic solution if it's going to give the user a pain in the ass.
    I agree, with a caveat. The caveat is that you probably overestimate how unique your configuration needs are and underestimate the value of picking something standard. Using a standard format means you tap into its network effects. The user gets "free" syntax highlighting and completion in their editor, automatic formatting, and tools like https://github.com/kislyuk/yq to query and modify config files. This is also an important part of user experience. If the format is custom, you will have to implement syntax highlighting for Vim, Emacs, VS Code, etc., and the user will have to install it.
    I think the right approach is to sort your concerns into the essential and the inessential, then choose a format based on the essential concerns. (Do you need includes to only apply to subsequent lines, or could you apply them by name or by nesting?) When you choose, prioritize standard formats. And if your needs are complex enough, consider embedding an interpreter like Lua or Starlark for configuration, or have the user write code in non-embedded Python or another language to generate JSON config that your software reads.
  - bonzini2 months ago
    > These days TOML (which more or less just works) just works
    The second "just works" should have been "is almost always enough".
- 2 months ago
  undefined
- wodenokoto2 months ago
  40 year old chiming in to say, what the hell are you talking about?
  - throwaway1502 months ago
    +1
    Yeah. I've got no idea what your parent comment is talking about.
    > When config files were written not to be easy to parse, but to make it easier for human beings to configure software.
    *eyes rolling*. All I can remember is the hundreds of hours I've spent trying to figure out how to configure something in Apache httpd, BIND, iptables, and god forbid, Sendmail!!
    Config files were written not to be easy to <anything>. There was no rhyme or reason. Every project had their own bespoke config. All from the whims and fancies of the devs of the project.
    Good thing that was all in the past and I had no job and no responsibilities. If software today made configuration like they did 40 years ago, I'd just give up!
  - Magma74042 months ago
    KEY=value, INI files?
    wodenokoto2 months ago
    That was my initial thought too, but I just don't see how they fit the description.
simonask2 months ago
I'm sorry, nothing beats KDL in terms of readability and friendliness. I've been using it in personal projects for a while, and it is just so pleasant. I wish it saw way more widespread usage.
https://kdl.dev/
- misiek082 months ago
  Similar to HCL which is way safer and clearer IMHO than all indent based craziness. Its lovely to see default values loaded thanks to some extra spaces. Brackets for the win!
  https://github.com/hashicorp/hcl
- shortrounddev22 months ago
  Looks like xml without < and >
- immibis2 months ago
  A GUI beats it.
  - HdS842 months ago
    You caveman! Everybody knew that watching config though a 30*80 chars ssh display in black/white should be enough for everybody. Who needs help displays, validation or even sliders?!
networked2 months ago
"An INI critique of TOML" this is inspired by was discussed in 2023: https://news.ycombinator.com/item?id=37595766. It received a lot of criticism, particularly for invoking Postel's law.
- nickm122 months ago
  As best as I can tell, "An INI critique of TOML" is a subtle parody, not something to take inspiration from.
  - arp2422 months ago
    It reads like a parody, but the author is pretty serious; they've tried to fairly aggressively inject it in the Wikipedia article as well.
martypitt2 months ago
HOCON is a worthy contender in this space - I wish it got more airtime. (We use it extensively).
JSON superset, optional quotes for keys, sensible string handling, comments, automatic env variable handling, variable references.
It's not perfect (all sufficiently powerful configuration language has quirks), but I love it.
- asimpletune2 months ago
  We used hocon at a place that I once worked at and I more or less liked it. It did get a lot of abuse though. I think apple released a configuration language that seemed pretty good for the same things that we used hocon. I think it was pkl or something?
kiitos2 months ago
> CONL uses indentation for structure.
Oops.
- cirwin2 months ago
  Author here. Seemed like the least bad of the options.
  Being able to comment out sections of a config file easily is a prime use-case; and that really implies using newlines as delimiters, and well, you fall into this trap..
  - kiitos2 months ago
    Why is the ability to comment-out entire sections of a config file a primary use case? What are the motivating requirements for this feature?
    That aside, you don't need semantically-meaningful indentation to support commenting-out whole sections, see e.g. any braces-based lexer/parser that supports `/* ... */` style comments.
rsyring2 months ago
Seems better to run with something everyone basically already knows¹ than to invent a new format with relatively zero support?
1: https://github.com/crdoconnor/strictyaml
aburdulescu2 months ago
Shameless plug :)
I've also been playing around with a configuration format, for similar reasons, although my approach is to make it easy(enough) to read/parse for both humans and machines.
HN post: https://news.ycombinator.com/item?id=42516608
Any feedback is welcomed, but keep in mind is just a toy project which has only one user in mind(me), no plans to conquer the world or solve the config format problems for all :)
qznc2 months ago
I like https://nestedtext.org because it doesn’t try to be clever and everything is just a string.
Rucadi2 months ago
Personally I've found great success using NIX as a programmable config file, and outputting json to be read by the application.
EasyMarion2 months ago
Really like the philosophy here. Keeping config formats minimal and text-first (rather than trying to be 'clever' with types or logic) feels underrated these days. CONL looks like it hits a nice sweet spot between human-editable and machine-parseable without drifting into 'just use a programming language' territory.
stared2 months ago
Be like:
- don’t mind the peculiarities of formats used for config
- create a format where semicolons denote comments (just… doesn’t look right)
- fph2 months ago
  OP has a detailed rationale for going with semicolons. Feel free to counter those points, but you can't just dismiss the thing with a "doesn't look right" without any argument.
  - stared2 months ago
    Rationale: in the most popular modern langugues it is # or //.
    In JS (well, its why we have JSON), it is //. In YAML, it is #.
    Moreover - semicolon is a natural character used in comments (unlike // or #). It inferes with our human parsing.
    fph2 months ago
    The first point is addressed in the article; you don't seem to address OP's counterpoint at all.
    I don't get the second point: why is that a problem if a semicolon appears in a comment? From what I understand, comments run until the end of the line, so a second semicolon after the first does nothing.
    NoahKAndrews2 months ago
    The problem is when a config value includes a semicolon, and the rest of the line gets ignored unintentionally, especially because strings aren't quoted
    fph2 months ago
    Ah, I see, so the problem is not a semicolon "used in comments", it's a semicolon used outside them. But then which character would you suggest instead? The article notes that there is the same problem with # (e.g. in `black = #000000`) and // (`url = https://en.wikipedia.com`). And these are arguably more common.
    Ringz2 months ago
    What about three simple rules to define comments:
    1. If # is in the first column of the line.
    2. If # is followed by a space.
    3. The # only starts a comment when it’s outside of quotes.
- mtlmtlmtlmtl2 months ago
  That part looks fine to me, but then again I'm a lisp guy.
  - saghm2 months ago
    I think this is something in some assembly formats too? I remember seeing it once and wondering if maybe that's where the idea of ending lines in C with semicolons came from since at least in the examples I saw in school, a large number of lines had trailing comments with a description of what the operation was doing.
    rzzzt2 months ago
    IDA uses ; for comments in its disassembler view, but it looks like C-style // single-line comments and /* comment blocks */ are also accepted by certain tools: https://en.wikibooks.org/wiki/X86_Assembly/Comments
Hyperlisk2 months ago
Nice! I share a similar set of thoughts and ideals around configuration languages and I'm working on one as well. Mine has a very similar syntax, so you might be interested! You can find it if you dig through my comments.
throwaway1502 months ago
So everyone now wants a configuration file format named after them, isn't it?
ChrisArchitect2 months ago
Earlier: https://news.ycombinator.com/item?id=43804489
dcreater2 months ago
Yes the existing formats have issues. Highly suspect that yet another format is the answer.
bmandale2 months ago
json but:
comments and
commas are allowed at the end
thoughts?
- dwheeler2 months ago
  You are re-inventing JSON5, another competitor in this space: https://json5.org/
suprjami2 months ago
Sigh. https://xkcd.com/927/
- jsomedon2 months ago
  Damn you fast, I was just about to link that one too :-)
- anon70002 months ago
  It’s linked in the opening paragraphs of the post lol.
- arp2422 months ago
  The only "sigh" here is the boring and predictive act of someone linking that xkcd as the most laziest and unimaginative content-free put-down that you can do.
  Lots of people post ideas to Hacker News. Lots of good ideas, lots of bad ideas, and lots in-between. It's very much against the hacker spirit to be such a dismissive lazy jerk about it.