Hey all, I didn't actually use Claude for the embeddings on this (bc they don't offer embeddings), but I think we can extrapolate that if a tiny open source model has this level of recognition, surely the SOTA models do. Very interesting to me that we can predict things like gender and college from a sample of writing for authors. Also interesting how much stronger the correlation may be between Coders and their Code to authors and their books. Seems code has a more "unique" fingerprint than your typical book perhaps. (Although actually looking at it again perhaps that's sample size. Would need to research more to know for sure.)