I'm not sure why I keep reading HN, 99% of the content is uninteresting, probably 99.9% now that every article is about AI. maybe I just like clicking on things.
Systems and agents both need to monitor and extract public web content into fresh structured data for their ingestion, intelligence workflows and analysis.
* Shameless plug * Our data infrastructure layer for businesses and AI turns continuously updated websites into a stream of structured data.
Protesilaos: https://protesilaos.com/codelog.xml and https://protesilaos.com/commentary.xml
HN: https://hnrss.org/frontpage
Sacha Chua: https://sachachua.com/blog/feed/index.xml
David Revoy: https://www.davidrevoy.com/feed/rss
Davep: https://blog.davep.org/feeds/all.atom.xml
xkcd: https://xkcd.com/atom.xml
YouTube - Michelle Khare: https://www.youtube.com/feeds/videos.xml?channel_id=UCGGZ_PO...
YouTube - TmarTn2: https://www.youtube.com/feeds/videos.xml?channel_id=UC36MGPf...
But I also extract topics automatically from the content too with LLMs, to allow for dynamic topic pages that users can separately subscribe to to tune their feeds.
Haven't promoted it much, but it's pretty amazing what you can do for a couple bucks a month. And my main thesis with this site is that by locking the content to only rss feeds of known blogs, you dramatically reduce the spam submission risk (basically eliminate it). Doesn't handle the spam comment side of things, but that's a different problem.
EDIT: I also open sourced a Rails engine I made to power this site if anyone is interested: https://github.com/dchuk/source_monitor
I went to a topic and then clicked on the header of something I was interested in expecting to be brought to the blog post directly. Needing to click on that same title again to be brought to the post was unintuitive to me, I searched around the page, went back and forth a few times and eventually figured it out.
As a user I would love to be able to click directly through to the article FROM the topic feed. I would expect that the comments is a URL to the page that the header currently brings me to. This would match my expectations from using sites like reddit/HN.
A one or two liner summary directly on the topics feed would be really great I think.
I do want to ask though (and I should make this clear in a FAQ or something): the way I check RSS feeds uses adaptive scheduling, so I intentionally don’t check feeds of sites too rapidly. Then the summarization is based on the full article content but I never render that full content on the site (to avoid traffic hijacking concerns). Given that: what’s the concern?
Your browser is not supported.
Please upgrade your browser to continue.
Can't even view your site with FirefoxEDIT: just checked in firefox, I don't see an issue. can you email me at me@dchuk.com and maybe I can debug with you?
UA being blocked for example:
Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:140.0) Gecko/20100101 Firefox/140.0
Did mess with it some more:Allowed:
Opera/9.80 (Windows NT 6.1; U; zh-tw) Presto/2.7.62 Version/11.01
Opera/9.80 (Windows NT 5.1; U; cs) Presto/2.7.62 Version/11.01
406: Mozilla/5.0 (Windows NT 5.1) Gecko/20100101 Firefox/14.0 Opera/12.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 14; rv:140.0) Gecko/20110101 Firefox/140.0
Maybe just remove it? 406 browser not supported
for ESR Firefox 140.If I set my UA to "FUCKIT" I can use the site perfectly fine. Why is there a User Agent Filter that disables the whole website? This should be maybe a warning, not a complete block.
https://rachelbythebay.com/w/2024/05/27/feed/
but coming from an aggressively anticommercial world view. She collects evidence that real world feed readers don't implement RSS correctly
https://rachelbythebay.com/w/2026/02/23/readers/
Her problems are the problems of a polling-based protocol and really if she does not like the RSS protocol she should stop publishing it and stand up an ActivityPub or PubSubHubBub service instead.
A big part of the value of Google Reader and the ecosystem around it was that Google could poll your RSS feed once and everyone could read it... A huge win for the Rachels!
Bit odd to take potshots at a third party blog on this discussion, why single out Rachel?
And more to the point, the dynamics here might be due to RSS being polling-based, but if feed readers implemented the RSS logic correctly it wouldn't matter nearly as much, would it?
(2) You can use a cache or be correct, pick one! I think of all the lame cache busting methods that are still in use because it took web browsers more than 15 years to get caching mostly right.
(3) If you'd been reading Rachel as opposed to asking why I pointed Rachel out your questions would be answered!
(4) Polling based systems come in two speeds: too fast and too slow and it is possible to be both at the same time
https://github.com/hparadiz/technexus/blob/release/src/Contr...
I would enjoy a JSON based refresh of the format.
Putting just the post intro in the feed and linking to the website feels like a safer approach, assume you have bot protections on the website, but that's a poor experience for people who want to read in their feed reader.
Edit:
Longer term, the approach might be - provide a separate RSS feed with full content but gated by a query parameter, then only give that URL to known-good consumers via email verification or patreon subscription, etc.
It would suck that people would have to pay more to consume content in their preferred way, but depending on your needs it might be a reasonable compromise.
Unless someone has a fix of whatever settings I've been using
Where? Not within the homelab space.
https://trends.google.com/explore?q=%2Fm%2F0n5tx&date=all&ge...
With US techs harvesting people's data, subscription mess, cars that are no longer cars but computers on wheel, and now AI, even folks with bare minimal knowledge are self hosting things.
All you need is a second hand dirty cheap Dell SFF computer from eBay, install Proxmox on it and even if it comes with only 8GB, you can still spin up a few Proxmox LXC containers (small like Docker but far better).
People are going back to buying physical media, old model of things, wired headphones is all time high.
MP3 players are all time high, no phone, no subscription, just music.
90s, early 2000s is so back and is a good thing, people themselves are putting a hard break on technology.
RSS makes life so much easier, some only provide the bare minimal while others, provide the whole post so I can read everything right there without opening a website.
Also, some podcast support it so I have a list of podcast that I list and can go back without having to go from website to website.
One place to govern them all, RSS still king.
[0] https://www.reddit.com/r/modnews/comments/1tq9vxo/protecting...
Get your rapacious hands away from my website please.
> and actively degrades programmatic access.
That's your problem. You choose these tools. If they can't function without ripping everyone else off then why do you persist in using them?