5 pointsby freakynit3 hours ago4 comments
  • enragebait2 hours ago
    This looks interesting, though you should know that “buddies” aren’t the ones who are too much alike, buddies are the ones different enough to complement each other, yet alike enough not to disagree too often. That’s why they say “opposites attract” even among “platonic” relationships.

    This could mean you might take the “temperament” of the person posting (like “estj”) and map the 2nd and 4th to the 4th and 2nd (the 2nd is regarded as the “input” and the 4th the “output”) so “S” is compatible with “P” and “N” Is compatible with “J”.

    And then give a bonus modification for the opposite of the others, so “E” likes “I” and vise versa (always a quite dude hanging out with a talkative dude, though not exclusively). And “T” prefers the company of “F” though not exclusively (see these as technical and creative.)

    This gives you compatible interfaces (input/output) and diverging (thus “more interesting”) social dispositions.

    You could probably turn that into a good dating algorithm if it isn’t already, though it works for “pals” too!

    • freakynit39 minutes ago
      These seem fun to explore. Will definitely check out. Thanks!
  • malandin2 hours ago
    Great project! I was thinking of building something similar with not only search but analytics as well. Could you hint at where the dataset comes from? I'd really like to have a look
    • freakynit44 minutes ago
      Thank you. This has been in my mind for past 1 year. Wanted to do it using vector embedding similarity match, but due to costs and compute requirements, had to resort to keyword based.

      The data comes from daily-updated public BigQuery dataset: https://news.ycombinator.com/item?id=40644563

  • tetris112 hours ago
    I like it! I'm just missing from the corpus for some reason...

    Quick glance: TF-IDF, cosine-similarity, the only thing missing is a nice UMAP :-)

    • freakynit42 minutes ago
      Thanks for the UMAP suggestion. Will add.

      Most of the authors are actually missing. Full processing would have yielded multi-trillion row dataset. I didn't rally have that kind of compute with me.

      I have even tried running the cross-join on BigQuery... after one hour, only about 3% was done.. so, had to cancel it.

  • holgan hour ago
    interesting, to me it eems "buddies" is the wrong term anyhow...
    • freakynit41 minutes ago
      Yea.. even if not wrong, it's definitely not correct. Couldn't think of any other, so, sticked to it for now.