HTTP Feeds: a minimal specification for polling events over HTTP(www.http-feeds.org)

72 pointsby sea-gold6 months ago8 comments

hombre_fatal6 months ago
It could use a section on high level justification / inspiration.
For example, what inspired this over a typical paginated API that lets you sort old to new with an afterId parameter?
philsnow6 months ago
Because the client requests pagination by lastEventId (a UUID), the server needs to remember every event forever in order to correctly catch up clients.
If instead the client paginated by lastEventTimestamp, then a server that for any reason no longer had a particular event UUID could at least start at the following one.
- tinodb6 months ago
  That’s why the article suggests using a uuid v6 which is time orderable. Or prefixing with an incrementing db id. So indeed, if you intend to delete events, you might want to make sure you have orderable ids of some sort.
sea-gold6 months ago
Previously discussed (April 2022; 95 comments): https://news.ycombinator.com/item?id=30904220
zzo38computer6 months ago
I think that HTTP is not the best way to do it, and that JSON is also not the best way to do it. (HTTP may work reasonably when you only want to download existing events and do not intend to continue polling.)
I also think using UUID alone isn't the best way to make the ID number. If events only come from one source, then just using autoincrementing will work (like NNTP does for article numbers within a group); being able to request by time might also work (which is also something that NNTP does).
lud_lite6 months ago
What happens if you need to catch up? You keep calling in a loop with a new lastEventId?
What is the intention there though. Is this for social media type feeds, or is this meant for synchronising data (at the extreme for DB replication for example!).
What if anything is expected of the producer in terms of how long to store events?
- dgoldstein06 months ago
  Sounds like it. But the compaction section has more details - basically you can discard events that are overwritten by later ones
DidYaWipe6 months ago
Never heard of "CloudEvents" before. How do people feel about those?
- junto6 months ago
  Here’s a nice comparison between CloudEvents and AsyncAPI from 2019. You can combine them. In the end being able to version and wrap events is useful, although amusingly it reminds me of SOAP.
  https://www.asyncapi.com/blog/asyncapi-cloud-events
  - DidYaWipe6 months ago
    Thanks!
- mdaniel6 months ago
  I enjoy anything that drives down NIH, or something that an existing library could possibly support, or something that I could take to my next job (or could possibly hire for)
  I believe cloud events are most common in Kafka-adjacent or event-driven architectures but I think they're used in some GCP serverless things, too
  - Maxious6 months ago
    They're also documented for aws https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-... and argo-events https://github.com/argoproj/argo-events/blob/master/docs/con...
  - DidYaWipe6 months ago
    I guess you mean "not invented here..."
    Yep, any opportunity to not reinvent stuff is a big win.
    But I'm wary of layers upon layers. I'm thinking of how this could be combined with MQTT... it doesn't seem totally redundant.
- paulddraper6 months ago
  Good
wackget6 months ago
Did someone just reinvent a GET API with cursor-based pagination?
- hdjrudni6 months ago
  Sure looks like it. I'm not getting what's new or interesting here.
  Cursors are actually better because you can put any kind of sort order in there. This "lastEventId" seems to be strictly chronological.
fefe236 months ago
This is an astonishingly bad idea. Don't do this.
Use HTTP server-sent events instead. Those can keep the connection open so you don't have to poll to get real-time updates and they will also let you resume from the last entry you saw previously.
https://developer.mozilla.org/en-US/docs/Web/API/Server-sent...
- montroser6 months ago
  Yeah, but in real life, SSE error events are not robust, so you still have to do manual heartbeat messages and tear down and reestablish the connection when the user changes networks, etc. In the end, long-polling with batched events is not actually all that different from SSE with ping/pong heartbeats, and with long-polling, you get the benefit of normal load balancing and other standard HTTP things
  - mikojan6 months ago
    But SSE is a standard HTTP thing. Why would you not be able to do "normal load balancing"?
    I would also rather not have a handful of long-polling loops pollute the network tab.
    jpc06 months ago
    “Normal load balancing” means “Request A goes to server A”, “Request B goes to server B” and there is no state held in the server, if there is a session its stored in a KV store or database which persists.
    With SSE the server has to be stateful, for load balancing to work you need to be able to migrate connections between servers. Some proxies / load balancers don’t like long lasting connections and will tear them down if there has been no traffic so your need to constantly send a heart beat.
    I have deployed SSE, I love the technology, I wouldn’t deploy it if I don’t control the end devices and everything in between, I would just do long polling.
    kiitos6 months ago
    Your description of "normal load balancing" is certainly one way to do load balancing, but in no way is it the presumptive default. Keeping session data in a shared source of truth like a KV store or DB, and expecting (stateless) application servers to do all their session stuff thru that single source of truth, is a fine approach for some use cases, but certainly not a general-purpose solution.
    > With SSE the server has to be stateful, for load balancing to work you need to be able to migrate connections between servers.
    Weird take. SSE is inherently stateful, sure, in the sense that it generally expects there to be a single long-lived connection between the client and the server, thru which events are emitted. Purpose of that being that it's a more efficient way to stream data from server to client -- for specific use cases -- than having the client long-poll on an endpoint.
    jpc06 months ago
    > Keeping session data in a shared source of truth like a KV store or DB, and expecting (stateless) application servers to do all their session stuff thru that single source of truth
    What would be a scalable alternative?
    Simple edge-case why this is a reasonable approach. Load balancer sends request to server A, server A sends response and goes offline, now load balancer has to send all request to server B->Z until server A comes back online. If the session data was stored on server A all users who were previously communicating to server A now lost their session data, probably reprompting a sign-in etc
    Theres some state you can store in a cookie, hopefully said state isn’t in any was mean to be trusted since rule 1 of web is you don’t trust the client. Simple case of a JWT for auth, you still need to validate the JWT is issued by you and hasn’t been invalidated, ie a DB lookup.
    andersmurphy6 months ago
    This is the same with request response. You need to auth on each request (unless you use a cookie).
    jpc06 months ago
    Exactly that you use a cookie which stores an id to a session stored in the KV/DB.
    Moving the session data to a JWT stores some session data in the JWT but then you need to validate the JWT on each request which depending on your architecture might be less overhead but it still means you need some state stored in a KV/DB and it cannot be stored on server same as with a session, this might legitimately be less state, just a JWT id of some sort and whether it’s not revoke but it cannot exist on the server, it needs to be persistent.
    andersmurphy6 months ago
    This take that SSE is stateful is so strange. Server dies it reconnects to another server automatically (and no you don't need ping/pong). It's only stateful if you make it stateful. It works with load balancing the same as anything else.
    jpc06 months ago
    The SSE spec has an event id and the spec states sending last event id on reconnection. That is by its nature stateful, now you could store that in a DB/KV itself but presumably you are already storing session data for auth and rate limiting so now you had to implement a different store for events.
    And I too naively believed there won’t be a need for ping/pong, then my code hit the real world and ping/pong with aliveness checks was in the very next commit because not only do load balancers and proxies decide to kill your connection, they will do it without actually closing the socket for some timeout so your server and client is still blissfully unaware the connection is dead. This may be a bug, but it’s in some random device on the internet which means I have to work around it.
    Long polling might run into the same issues but in my experience it hasn’t.
    I really do encourage you to actually implement this kind of pattern in production for a reasonable number of users and time, there’s a reason so many people recommend just using long polling.
    This also assumes long running servers, long polling would fall back to just boring old polling, SSE would be more expensive if your architecture involves “serverless”.
    Realistically I still have SSE in production, on networks I can control all the devices in the chain because otherwise things just randomly break…
    mikojan6 months ago
    > The SSE spec has an event id and the spec states sending last event id on reconnection.
    Last event ID is not mandatory. You may omit event IDs and not deal with last event ID headers at all.
    More importantly, the client is sending the last event ID header. Not the server. The only state in the server is a list of events somewhere which you would have to have anyway if you want clients to receive events that occurred when they were not connected or if you allowed clients to fetch a subset of them like with long-polling.
    So there is really no difference at all here with regards to long-polling
  - andersmurphy6 months ago
    Never had to use ping/pong with SSE. The reconnect is reliable. What you probably had happen was your proxy or server return a 4XX or 5XX and that cancels the retry. Don't do that and you'll be fine.
    SSE works with normal load balancing the same as regular request/response. It's only stateful if you make your server stateful.
  - xyzzy_plugh6 months ago
    Correct. In the end, mechanically, nothing beats long polling. Everything ends up converging at which point you may as well just long poll.
- toomim6 months ago
  Or use Braid-HTTP, which gives you both options.
  (Details in the previous thread on HTTP Feeds: https://news.ycombinator.com/item?id=30908492 )
- Alifatisk6 months ago
  Isn't SSE limited to like 12 tabs or something? I remember vividly reading about a huge limitation on that hard limit.
  - curzondax6 months ago
    6 tabs is the limit on SSE. In my opinion Server Sent Events as a concept is therefore not usable in real world scenarios as of this limitation or error-detection around that limitation. Just use Websockets instead.
    tefkah6 months ago
    that's an http 1.1 only limitation. https://developer.mozilla.org/en-US/docs/Web/API/Server-sent...
    toomim6 months ago
    HTTP2 is limited to 100.
    Alifatisk6 months ago
    Ahhh
    andersmurphy6 months ago
    This is a misinformed take. H2/H3 gives you 100+ connections. Even then you only need h2 from the proxy to the browser.
    If you have to use H1 for some reason you can easily prune connections with the browser visibility api.