https://www.reddit.com/r/rust/comments/1d3b356/my_new_favori...
[–]Timzhy0 3 points 7 months ago
Btw I think one can go a step further than the author, there is no need to keep two explicit ExprRef baked in a binary node (lhs, rhs). You can exploit locality, basically the AST, seen it the LISP way, is just an arbitrarily nestable list, where elements are atoms or other lists. Hence all you need to know is where each list ends (and if it's an atom you can assume it spans one node) and actually one bit to know if it is the last entry in the list is quite ergonomic as well (because then you can distinguish whether moving next slot in the AST means there is a sibling). Basically it's easier to keep it sync while constructing and takes up less memory per node. I pay 40 bits per node, stored interleaved for best cache locality (some unaligned accesses but I think it's still worthwhile), 8 bits for the tag, 32 for the data, if data is bigger, 32 is an index into some auxiliary segment (basically a ptr).
https://github.com/gritzko/librdx/blob/master/abc/B.md
Apart from locality and lifetimes, these flat data structures improve composability. When every data structure is a flat buffer, you can mmap them or zip them or send them by the network, all by the same routine. They are uniform like bricks, in a sense.
Some of my compilers export the AST as lisp trees. Much smaller and more readable than json, and it can be executed. Uniform like bricks
So not flat then. Prefix is not postfix. Forth, and most concatenative languages, are much closer to actually bein, flat.
Lisp is trivial to flatten, but that's not the same thing.
Iverson's 1962 book also mentions tree representations, see pp45-62: https://archive.org/details/aprogramminglanguage1962/page/n6...
This was closest: https://www.dyalog.com/blog/2018/01/stackless-traversal/
Your links should still be useful for orienteering!
Does APL need a type system
Guess it's time to reverse whatever else I can find
https://www.dyalog.com/uploads/conference/dyalog16/prerequis...
*Bad faith, or just a run o' the mill, aka compleatly forgiveable profiteering?
Just looking at refs for the moment: Henglein and Hinze's discriminators are interesting, whenever you come back up for air. (are they also amenable to sorting codata?)
The oft-cited R Bernecky is, IIRC, also known as "Boolean Bob" for his SWAR-style algos. EDIT: nope, confused him with R Smith: https://aplwiki.com/wiki/Bob_Smith
(I once asked Boolean Bob if any of his tricks went back to the card processing days —I could even believe the keyed tree* might?— but he was too young to know, and the people he'd have liked to ask are no longer available.)
EDIT: Aardappel also has some interesting languages: https://strlen.com/#programming-languages
* for manipulating Bills of Materials despite the linearity of card decks?
EDIT2: compare "§3.2.1 Constructing Node Coordinates" (p34) with [T in "Table 1.19 Full list matrices of the tree of Fig. 1.16" in A Programming Language (p50): https://archive.org/details/aprogramminglanguage1962/page/n6...
Fc
1 0 0 0 0
1 1 0 0 0
1 1 1 0 0
1 1 1 1 0
1 1 1 1 1
1 1 1 1 2
1 1 1 1 3
1 1 2 0 0
1 1 2 2 0
Ec
1 0 0
1 1 0
1 1 1
1 1 2
1 1 3
1 2 0
1 3 0
1 3 4
1 3 5
1 3 6
[all three appear to be 1-origin, probably due to the phenomenon mentioned in the footnote on p49 of APL][Too low global b/w to feel the adrenalin.. wow.. what was your deduction chain for figuring how BoMs was next item on my stack? Guessing you started from assumption that pure bits mongering are not on the boundary of my ikigais(yet)]
On the heap, I (re)surfaced "Typed Array Intermediate Language" slides, but too low (local) b/w to try to find out^W^W^W^W sus out if this or smth v similar is already in dyalog.com's workflow.
https://news.ycombinator.com/item?id=11974936
>Bernecky
https://news.ycombinator.com/item?id=11963548
What were those slides about formalizing Euclid ?
[Medium b/w vibing that multiplicity of edits is a precise estimate of flow-of-war, ~ webpage memory usage is an accurate estimate of just how well run the entire organization is..]
[BoMs were complete coincidence]
https://www.weizmann.ac.il/mcb/alon/sites/mcb.alon/files/use...
For an vulpine organ they speak very confidently of the f-o-w
This happens naturally if you bump-allocate them in a garbage-collected run-time, particularly under a copying collector. Free lists also tend to co-locate because they are produced during sweep phases of GC which run through heaps in order of address.
Don't make me bring out the L word for the billionth time.
> A flat array of Exprs can make it fun and easy to implement hash consing
OK, it's not a case of L-ignorance, just willful neglect.
> A sufficiently smart memory allocator might achieve the same thing, especially if you allocate the whole AST up front and never add to it
> Again, a really fast malloc might be hard to compete with—but you basically can’t beat bump allocation on sheer simplicity.
Later the theory behind such structures was revealed as "Nested set model" [1]. The article seems to not mention the internal representation, but I think that the implementation should use something like my solution, so fixed number of references per node
Adding to that, it also makes editing the AST vastly more efficient.
I have discovered that principle on my own when I worked on an editor that directly operated on the AST instead of text. I found manipulating the tree-style AST so painful, constantly traversing the tree and all. Once I made it flat, my life was a hell lot easier. You can just directly index any part of AST in linear time.
Requires the language to have something equivalent to pattern synonyms to be as invisible as twee, though.
In twee a TermList is a slice of a bytearray (two ints for offset/length plus a pointer).
And a term is an int for the function symbol and an unpacked TermList for the arguments.
The pattern match synonyms load a flat representation from the array into a view type, and the allocation of the view type cancels out with the pattern matching so everything remains allocation free.
https://hackage.haskell.org/package/twee-lib-2.4.2/docs/Twee...
You get some traversals for free with this layout (preorder, reverse post order). Can search for subtrees with string searching algorithms or more complex things with regex.
Suppose you want to attach additional information to some of the nodes of the AST. Different algorithms on the AST will attach different information; you don't necessarily need them all at the same time or know ahead of time what you'll need.
With nodes, you have to have some sort of node/value hash table, or hang a key/value map off each node. But with this flattened representation, each datum gets its own flat array as well, which can be independently allocated and deallocated.
One other thing I noticed about this flat representation is that it throws static typing into a cocked hat! All you have to refer to other nodes is indices. All different kinds of nodes are stored in the same array.
In practice I think there are more differences. E.g. AST interpreters tend to pass environments around while bytecode interpreters often store these on a stack (though I guess there's nothing stopping you from doing this with an AST either). I wonder if there's some goldilocks zone for ease of implementation with decent performance.
I seem to recall that the Red Dragon Book (Compilers: Principles, Techniques and Tools, Aho, Sethi, Ullman [1988]) describes a technique whereby intermediate code is represented in RPN, and transformations are performed by pattern matches on it.
Flattening ASTs and other compiler data structures (2023) - https://news.ycombinator.com/item?id=42181603 - Nov 2024 (2 comments)
Flattening ASTs and other compiler data structures - https://news.ycombinator.com/item?id=36559346 - July 2023 (81 comments)