This could mean you might take the “temperament” of the person posting (like “estj”) and map the 2nd and 4th to the 4th and 2nd (the 2nd is regarded as the “input” and the 4th the “output”) so “S” is compatible with “P” and “N” Is compatible with “J”.
And then give a bonus modification for the opposite of the others, so “E” likes “I” and vise versa (always a quite dude hanging out with a talkative dude, though not exclusively). And “T” prefers the company of “F” though not exclusively (see these as technical and creative.)
This gives you compatible interfaces (input/output) and diverging (thus “more interesting”) social dispositions.
You could probably turn that into a good dating algorithm if it isn’t already, though it works for “pals” too!
The data comes from daily-updated public BigQuery dataset: https://news.ycombinator.com/item?id=40644563
Quick glance: TF-IDF, cosine-similarity, the only thing missing is a nice UMAP :-)
Most of the authors are actually missing. Full processing would have yielded multi-trillion row dataset. I didn't rally have that kind of compute with me.
I have even tried running the cross-join on BigQuery... after one hour, only about 3% was done.. so, had to cancel it.