About that data though, just publish that. Throw the data and tooling up on github or huggingface if it's a massive dataset. Would be interested in comparing methodologies for deriving sentiment.
159 stories that hit score 100 in my tracking, with HN points, comments, and first-seen timestamp.
Methodology: - Snapshots every 30 minutes (1,576 total) - Filtered to score=100 (my tracking cap) - Deduped by URL, kept first occurrence - Date range: Dec 2025 - Jan 2026
For sentiment, I ran GPT-4 on the full article text with a simple positive/negative/neutral classification. Not perfect but consistent enough to see the 2:1 pattern.
If this dies in /new, at least I proved my own point.
65% of Hacker News posts have negative sentiment, and they outperform
People will downvote a headline with positive comments on something they don't like.
But what do they do with a negative headline about something they don't like ? I guess they will upvote it to show they also don't like it.
So negative wins.
"ran GPT-4 sentiment analysis on full article text." I think most people vote based on headlines, not on article text.
I went with full article text because I wanted to capture what the content actually delivers, not just what the headline promises. A clickbait negative headline with a balanced article would skew results if I only looked at titles.
That said, you've got me thinking. It would be interesting to run sentiment on headlines separately and compare. If headline sentiment correlates strongly with article sentiment, your point stands. If they diverge, there might be something interesting about the gap between promise and delivery.
Might be a good follow-up analysis. Thanks for pushing on this.