11 pointsby Krishnaa_6 hours ago6 comments
  • vivzkestrel3 hours ago
    - as a data science beginner i am always curious when i see these kinda posts - where did ya get this dataset from (kaggle, somewhere else?) - what type of analysis did you actually run on the raw data? - is there a repo somewhere where i can take a look (dont see a github link on the website)
    • Krishnaa_3 hours ago
      Data source: I scraped it directly from YC's public API. If you go to the YC company directory (ycombinator.com/companies) and inspect the network tab, you'll see it hits an Algolia search endpoint. That gives you structured JSON for every company: name, batch, one-liner, tags, industry, team size, location, etc. I pulled all companies from the last 5 batches (W25 through W26), which gave me 793 companies.

      For founder bios, I scraped the individual company pages on YC's site, each one lists the founders with short bios. That gave me 1,625 founder profiles to work with.

      Analysis: A mix of things, all in Python:

      -> Basic aggregations (counts by industry, tag, batch, geography)

      -> Trend analysis across batches (what's rising/falling)

      -> NLP clustering (TF-IDF + KMeans on company descriptions to find hidden themes) Cosine similarity between company descriptions to find competitive overlap ("crowding")

      -> Cross-correlations between features (is_ai × is_b2b, founder count × hiring, etc.)

      -> Founder bio keyword extraction to map backgrounds (ex-FAANG, PhD, repeat YC, etc.)

      -> A simple heuristic classifier for the AI wrapper vs deep tech breakdown

      Nothing fancy ML-wise — mostly pandas, scikit-learn, and some regex.

      Built in less than 30 mins using Claude.

  • allinonetools_3 hours ago
    This matches what I have been seeing too. The bar feels much higher now — just wrapping an API is not enough unless there is real usefulness behind it. The teams solving specific, practical problems seem to stand out more.
    • Krishnaa_3 hours ago
      The Formula

      Pick a boring, high-value industry. Build AI agents that replace manual workflows. Make it deep enough that it's not a wrapper. Have 2 founders - one technical, one with domain expertise.

  • mtmail6 hours ago
  • reducesuffering3 hours ago
    LLM generated
  • Krishnaa_3 hours ago
    [dead]
  • Krishnaa_3 hours ago
    [dead]