For founder bios, I scraped the individual company pages on YC's site, each one lists the founders with short bios. That gave me 1,625 founder profiles to work with.
Analysis: A mix of things, all in Python:
-> Basic aggregations (counts by industry, tag, batch, geography)
-> Trend analysis across batches (what's rising/falling)
-> NLP clustering (TF-IDF + KMeans on company descriptions to find hidden themes) Cosine similarity between company descriptions to find competitive overlap ("crowding")
-> Cross-correlations between features (is_ai × is_b2b, founder count × hiring, etc.)
-> Founder bio keyword extraction to map backgrounds (ex-FAANG, PhD, repeat YC, etc.)
-> A simple heuristic classifier for the AI wrapper vs deep tech breakdown
Nothing fancy ML-wise — mostly pandas, scikit-learn, and some regex.
Built in less than 30 mins using Claude.
Pick a boring, high-value industry. Build AI agents that replace manual workflows. Make it deep enough that it's not a wrapper. Have 2 founders - one technical, one with domain expertise.