JSIR: A High-Level IR for JavaScript(discourse.llvm.org)

50 pointsby nnx8 hours ago6 comments

sheepscreek7 hours ago
This is exciting stuff!
My interpretation: If the JSIR project can successfully prove bi-directional source to MLIR transformation, it could lead to a new crop of source to source compilers across different languages (as long as they can be lowered to MLIR and back).
Imagine transmorphing Rust to Swift and back. Of course you’d still need to implement or shim any libraries used in the source language. This might help a little bit with C++ to Rust conversions - as more optimizations and analysis would now be possible at the MLIR level. Though I won’t expect unsafe code to magically become safe without some manual intervention.
- jeswin31 minutes ago
  For tsonic (https://github.com/tsoniclang/tsonic) which is trying to convert TS to C# and then to native binary via NativeAOT, I took almost the opposite tradeoff from JSIR.
  JSIR is optimizing for round-trips back to JavaScript source. But since in language to language conversion teh consumer is a backend emitter (C# in my case), instead of preserving source structure perfectly, my IR preserves resolved semantic facts: types, generic substitutions, overload decisions, package/binding resolution, and other lowering-critical decisions.
  I could be wrong, but I suspect transpilers are easier to build if it's lowering oriented (for specific targets).
jcuenod7 hours ago
I came across this project in the last couple of days too. Being able to decompile from Hermes bytecode sounds awesome.
Here's the repo: https://github.com/google/jsir (it seems not everything is public).
Here's a presentation about it: https://www.youtube.com/watch?v=SY1ft5EXI3I (linked in from the repo)
pizlonator5 hours ago
> Industry trend of building high-level language-specific IRs
"Trend"?
This was always the best practice. It's not a "trend".
- sjrd4 hours ago
  It seems to me that there's a certain "blindness" between two compiler worlds.
  Compiler engineers for mostly linear-memory languages tend to only think in terms of SSA, and assume it's the only reasonable way to perform optimizations. That transpires in this particular article: the only difference between an AST and what they call IR is that the latter is SSA-based. So it's like for them something that's not SSA is not a "serious" data structure in which you can perform optimizations, i.e., it can't be an IR.
  On the other side, you actually have a bunch of languages, typically GC-based for some reason, whose compilers use expression-based structures. Either in the form of an AST or stack-based IR. These compilers don't lack any optimization opportunities compared to SSA-based ones. However it often happens that compiler authors for those (I am one of them) don't always realize all the optimization set that SSA compilers do, although they could very well be applied in their AST/stack-based IR as well.
hajile6 hours ago
I want them to finish the official TC39 binary AST proposal. Nearly twice as fast to parse and a bit smaller than minified code makes it a pretty much universally useful proposal.
https://github.com/tc39/proposal-binary-ast
croes6 hours ago
IR = Intermediate Representation
https://en.wikipedia.org/wiki/Intermediate_representation
- tamimio5 hours ago
  Thank you, half way through the article and I am thinking infrared.
  - giorgiozan hour ago
    I also didn't know the acronym IR. A good solution is passing the URL to ChatGPT and asking "what does IR mean in this url: "
jhavera3 hours ago
Interesting timing. We have been working on something that takes the opposite design philosophy. JSIR is designed for high-fidelity round-trips back to source, preserving all information a human author put in. That makes sense when the consumer is a human-facing tool like a deobfuscator or transpiler.
We have been exploring what an IR looks like when the author is an AI and the consumer is a compiler, and no human needs to read the output at all. ARIA (aria-ir.org) goes the other direction from JSIR. No source round-trip, no ergonomic abstractions, but first-class intent annotations, declared effects verified at compile time, and compile-time memory safety.
The use cases are orthogonal. JSIR is the right tool when you need to understand and transform code humans wrote. ARIA is the right tool when you want the AI to skip the human-readable layer entirely.
The JSIR paper on combining Gemini and JSIR for deobfuscation is a good example of where the two worlds might intersect. Curious whether you have thought about what properties an IR should have to make that LLM reasoning more reliable.
- oldmanhorton2 hours ago
  > when the author is an AI and the consumer is a compiler, and no human needs to read the output at all.
  This seems like a big bet on the assumption that fully autonomous codegen without humans in the loop is imminent if not already present - frankly, I hope you are wrong.
  Even if that comes to pass in some cases, I also find it hard to believe that an LLM will ever be able to generate code in any new language at the same level with which it can generate stack overflow-shaped JavaScript and python, because it’ll never have as robust of a training set for new languages.
  - measurablefunc21 minutes ago
    I recently wrote a simple interpreter for a stack based virtual machine for a Firefox extension to do some basic runtime programming b/c extensions can't generate & evaluate JavaScript at runtime. None of the consumer AIs could generate any code for the stack VM of any moderate complexity even though the language specification could fit on a single page.
    We don't have real AI & no one is anywhere near anything that can consistently generate code of moderate complexity w/o bugs or accidental issues like deleting files during basic data processing (something I ran into recently while writing a local semantic search engine for some of my PDFs using open source neural networks).