Outside of that, it's tough to compete with JSON in the "human readable unschematized serialization format" market, especially targetting JavaScript:
Use in the browser requires some degree of bundle size increase, since the parser code needs to be loaded before your format can be used. WebAssembly libraries are usually quite large compared to a pure-JS implementation. According to [bundlejs](https://bundlejs.com/?q=%40duper-js%2Fwasm&treeshake=%5B*%5D), @duper-js/wasm weighs in at about 488 kB uncompressed, 159 kB gzip.
Use in any JavaScript runtime means you're competing against the runtime's native `JSON.parse` and `JSON.stringify`. In v8, these are very quick and have runtime-level tricks to go faster, for example see [v8's recent post on making JSON.stringify 2x faster](https://v8.dev/blog/json-stringify) when serializing plain objects with no funny business .toJSON methods, replacer, or indent formatting.
Besides those points, my major complaint about JSON is how expensive it is to encode binary data for transmission; in JSON I usually do base64, with your format it's transformed to escape characters that are less efficient than base64, right? \xNN is base16 with 2 extra bytes wasted on the \ and x, or \uNNNN which is base 10 with 2 extra bytes. Is there a way you can fit binary with no expensive encode/decode step into the format?
So, for me this seems suitable as a config file format: there you get good benefit from comments, identifiers, easier string authoring. Not sure I need the binary raw string thingy in config files that much, but I guess it doesn't hurt.
This actually somewhat works right now. If you pass this JSON5 example through Prettier:
{
// comments
unquoted: 'and you can quote me on that',
singleQuotes: 'I can use "double quotes" here',
lineBreaks: "Look, Mom! \
No \\n's!",
hexadecimal: 0xdecaf,
leadingDecimalPoint: .8675309, andTrailing: 8675309.,
positiveSign: +1,
trailingComma: 'in objects', andIn: ['arrays',],
"backwardsCompatible": "with JSON",
}
You’ll get: {
// comments
"unquoted": "and you can quote me on that",
"singleQuotes": "I can use \"double quotes\" here",
"lineBreaks": "Look, Mom! \
No \\n's!",
"hexadecimal": 0xdecaf,
"leadingDecimalPoint": 0.8675309,
"andTrailing": 8675309,
"positiveSign": +1,
"trailingComma": "in objects",
"andIn": ["arrays"],
"backwardsCompatible": "with JSON"
}
Which is still invalid JSON... but it does fix unquoted keys, floats, trailing comma, and single → double quote strings with correct escaping. So if you have “format on save” enabled in your editor, it might just work!The config file transpiration to JSON idea is quite interesting. It's pretty similar to how I'm already defining the TextMate grammar used by the website's syntax highlighter, so I'll certainly try to incorporate that into the tooling.
// idea of implementing public duper.parse function to lean on
// runtime's JSON.parse
//
// downlevel to json, eg binary strings become base64 normal json strings
const { jsonString, enhancements } = duper.duperToJSON(data)
// let the runtime go fast when decoding
const rawObject = JSON.parse(jsonString)
// `enhance` knows the paths to all the binary base64 strings
// and replaces them with Uint8Arrays
const decoded = duper.enhance(rawObject, enhancements)
Here enhancements is something very easy / low cost to construct over the FFI bridge, like type Path = Array<string | number>
type TransformFn = (value: unknown) => unknown
type Transform = TransformFn | Enhancements
type Enhancements = Array<[path: Path, transform: Transform]>
Not sure if this would end up faster, it may allocate more, but it's probably better than unoptimized object/array construction from WASM/native -> runtime. You could also try with a `reviver` argument to JSON.parse but i always find the lack of full path to key somewhat clunky.Today JSON is winning, but for more complex structures, there's still syntax issues in output. XML does reasonably well (given the deep react jsx/HTML in the training corpos), so perhaps that will make a comeback.
Are there benchmarks on this? I think the SOTA models are fine -- they can work with most models, but the fun is that models that are 90% of SOTA performance and cost 90% less - which output format do they work best with. This is where the winner will be found.
TLDR: probably JSON or XML will remain the config format for a while.
Making them part of the language would increase the complexity of parsers - how would you validate that a date is actually valid? It's doable (YAML and TOML do it, after all) but requires extra steps.
MDN page on JavaScript's Temporal library gives a good overview of the difference between the two; today's practice of encoding Instants as ISO 8601 strings in UTC (Z suffix) or at a UTC offset is okay for ephemeral data-in-motion that will be used right now, but is not a good practice for persisted data since time zones, DST rules, etc change all the time. Temporal is the JS-specific API, but these concepts apply to all handling of date/time/etc data in computer systems.
That said, v8 plans to use [temporal_rs][] as their Temporal backend.
Temporal: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...
temporal_rs: https://crates.io/crates/temporal_rs
You can encode extended ZonedDateTime information to string following this RFC [Date and Time on the Internet: Timestamps with Additional Information](https://www.rfc-editor.org/rfc/rfc9557.txt)