It's pretty much always worth it to have an API like `send(message).then(res => ...)` in a serious app.
But I agree. The upgrade request is confusing, and it's annoying how your websocket server is this embedded thing running inside your http server that never integrates cleanly.
Like instead of just reusing your middleware that reads headers['authorization'] from the websocket request, you access this weird `connectionParams` object that you pretend are request headers, heh.
But the idiosyncrasies aren't that big of a deal (ok, I've just gotten used to them). And the websocket browser API is nicer to work with than, say, EventSource.
For example, making a sandwich: You have to retrieve exactly two slices of bread after finding the loaf in the fridge. Apply butter uniformly after finding the appropriate knife, be sure to apply about a 2.1mm level of coating. After all of that you will still need to ensure you've calibrated the toaster!"
It's a tough sell to convince me that a protocol which was designed primarily for resource transfer via a strict, stateless request-response mode of interaction, with server push tacked on top as an afterthought is simpler than something which was built from the ground up to be bidirectional.
The RFC explains it: https://datatracker.ietf.org/doc/html/rfc6455#section-5.3
Also, their alternative is just a library. It's not like they're selling a SaaS, so we shouldn't be mean spirited.
Am I on the right website? checks URL
People find anything to be mean about on here.
What the author and similar web developers consider complex, awkward or difficult gives me pause. The best case scenario is that we've democratized programming to a point where it is no longer limited to people with highly algorithmic/stateful brains. Which would be a good thing. The worst case scenario is that the software engineering discipline has lost something in terms of rigor.
The real problem with the software engineering discipline is that we are too easily distracted from solving the actual business problem by pointless architecture astronautics. At best because of boredom associated with most business problems being uninteresting, at worst to maliciously increase billable hours.
There are two pervasive themes in software engineering:
- those who do not understand the problem domain complaining that systems are too complex.
- those who understand the problem domain arguing that the system needs to be refactored to shed crude unmaintainable hacks and further support requirements it doesn't support elegantly.
Your comment is in step 1.
And Twitter is just a message board.
Those who do not understand the problem domain complain that systems are too complex.
...and yes, the stack overflow in the JS version was trivially fixable and then the JS version worked pretty well.
This article conflates a lot of different topics. If your WebSocket connection can be easily replaced with SSE+POST requests, then yeah you don’t need WebSockets. That doesn’t mean there aren’t a ton of very valid use cases (games, anything with real time two-way interactivity).
No need for WebSockets there as well. Check out WebTransport.
https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_...
That's a pretty convincing use-case. Why use something standard if it can be non-standard custom instead!
Isn't WebTransport basically WebSockets reimplemented in HTTP/3? What point where you trying to make?
No.
Thanks for your insight.
It seems you need to urgently reach out to the people working on WebTransport. You seem to know better and their documentation contradicts and refutes your assertion.
If you take some time to learn about WebTransport, you will eventually notice that if you remove HTTP/3 from if you remove each and every single feature that WebTransport touts as changes/improvements over WebSockets.
To me the sticking point is what if the "response" message never comes? There's nothing in the websocket protocol that dictates that messages need to be acknowledged. With request/response the client knows how to handle that case natively
> And the websocket browser API is nicer to work with than, say, EventSource.
What in particular would you say?
Kind of like how you also need to implement app-layer ping/pong over websockets for keepalive even though tcp already sends its own ping/pong. -_-
As for EventSource, I don't remember exactly, something always comes up. That said, you could say the same for websockets since even implementing non-buggy reconn/backaway logic is annoying.
I'll admit, time for me to try the thing you pitch in the article.
Ref: https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_...
There's even a whole spec for that: JSON-RPC, and it's quite popular.
Perhaps I'm wrong, but I believe HTTP streaming is for chunking large blobs. I worry that if you use this pattern and treat streaming like a pub/sub mechanism, you'll regret it. HTTP intermediaries don't expect this traffic pattern (e.g., NGINX, CloudFlare, etc.). And I suspect every time your WiFi connection drops while the stream is open, the fetch API will raise an error as if the request failed.
However, I agree you probably don't need WebSockets for many of the ways they're used—server-sent events are a simpler solution for many situations where people reach for WebSockets... It's a shame SSEs never received the same fanfare.
> server-sent events are a simpler solution
Fwiw Server-Sent Events are a protocol on top of HTTP Streaming.
In fact I'm somewhat surprised that the article doesn't mention it, instead rolling their own SSE alternative that looks (to my non-expert eyes) like a lower level version of the same thing. It seems a bit weird to me to use chunks as a package boundary, I'd worry that that has weird edge cases (eg won't large responses be split into multiple chunks?)
Websockets requires almost a completely new L7 stack and tons of special configuration to handle Upgrade, text or data frames, etc. And once you're out of "HTTP mode" you now have to implement the primitive mechanics of basically everything yourself, like auth, redirects, sessions, etc.
It's why I originally made Tiny SSE which is a purpose-built SSE server written in Rust and programmable with Lua.
IMO 'just works' means Apache suupports it out of the box with a simple config file and you can just start sending messages to client IPs.
while true; do
curl example.com/sse | handle-messages.sh
done
Because it's just text-over-http. This isn't possible with websockets without some kind of custom client and layer 7 protocol stack.In this way SSE and WebSockets are exactly the same. They are HTTP requests that you keep open. To firewalls and other network equipment both look the same. They look like long lived http requests, because that is what they are.
> once you're out of "HTTP mode" you now have to implement the primitive mechanics of basically everything yourself, like auth, redirects, sessions, etc.
WebSockets do support authentication via cookies or custom headers, don't they?
i feel like clients sending requests to servers is a pretty well-solved problem with regular http? i can't imagine how that could be the difficult part of the equation.
It will depend on how the websocket architecture is implemented. A lot of systems will terminate the HTTP connection at the CDN or API gateway and just forward the upgraded TCP socket to the backend without any of the HTTP semantics intact.
Authenticating a websocket is just as easy as authenticating a regular http request. Because it is exactly the same.
I think websockets certainly have their uses. Mostly in systems where SSE isn't available quickly and easily, or when sending a bunch of quick communications one after another as there's no way to know if the browser will pipeline the requests automatically or if it'll set up a whole bunch of requests.
It is a bit of a shame though, that in order to do most useful things with SSEs you have to resort to doing non-spec-compliant things (e.g. send initial payload with POST).
Arguably it’s also because of serverless architecture where SSE can be used more easily than WS or streaming. If you want any of that on Lambda and API Gateway, for example, and didn’t anticipate it right off the bat, you’re in for quite a bit of pain.
OpenAI uses SSE for callbacks. That works fine for chat and other "medium" duration interactions but when it comes to fine tuning (which can take a very long time), SSE always breaks and requires client side retries to get it to work.
So, why not instead use something like long polling + http streaming (a slight tweak on SSE). Here is the idea:
1) Make a standard GET call /api/v1/events (using standard auth, etc)
2) If anything is in the buffer / queue return it immediately
3) Stream any new events for up to 60s. Each event has a sequence id (similar to the article). Include keep alive messages at 10s intervals if there are no messages.
4) After 60s close the connection - gracefully ending the interaction on the client
5) Client makes another GET request using the last received sequence
What I like about this is it is very simple to understand (like SSE - it basically is SSE), has low latency, is just a standard GET with standard auth and works regardless of how load balancers, etc., are configured. Of course, there will be errors from time to time, but dealing with timeouts / errors will not be the norm.
> SSE always breaks and requires client side retries to get it to work
Yeah, but these are automatic (the browser handles it). SSE is really easy to get started with.
I'm curious though, what is your solution to this?
Secondly, not every client is a browser (my OpenAI / fine tune example is non-browser based).
Finally, I just don't like the idea of things failing all time with something working behind the scenes to resolve issues. I'd like errors / warnings in logs to mean something, personally.
>> I don't understand the advantages of recreating SSE yourself like this vs just using SSE
This is more of a strawman and don't plan to implement it. It is based on experiences consuming SSE endpoints as well as creating them.
Cookies work fine, and are the usual way auth is handled in browsers.
> Secondly, not every client is a browser (my OpenAI / fine tune example is non-browser based).
That's fair. It still seems easier, to me, to save any browser-based clients some work (and avoid writing your own spec) by using existing technologies. In fact, what you described isn't even incompatible with SSE - all you have to do is have the server close the connection every 60 seconds on an otherwise normal SSE connection, and all of your points are covered except for the auth one (I've never actually seen bearer tokens used in a browser context, to be fair - you'd have to allow cookies like every other web app).
I'm not sure what this means because it supports the withCredentials option to send auth headers if allowed by CORS
You are wrong in the case of Chrome and Firefox. I have tried it and streaming e.g. unordered list elements are displayed instantly.
But for Safari, "text/html" streaming happens in 512 byte chunks[1].
That said, I'd be surprised if proxy software or services like Cloudflare didn't have logic to automatically opt out of "CDN mode" and switch to something more transparent when they see "text/event-stream". It's not that uncommon, all things considered.
I don't believe this is correct. To my knowledge, video stream requests chunks by range and is largely client controlled. It isn't a single, long lived http connection.
Yes, the statement is patently wrong. There are a few very popular video formats whose main feature is chunking through HTTP, like HTTP Live Streaming or MPEG-DASH.
Browsers talking to static web servers use HTTP byte ranges requests to get chunks of videos and can use the same mechanism to seek to any point in the file.
Streaming that way is fast and simple. No fancy technology required.
For MP4 to work that we you need to render it as fragmented MP4.
Probably because byte range is required for seeking, and playing from the beginning is equivalent to seeking at 0.
> Wouldn't that be additional overhead/round-trips?
No because the range of the initial byte range request is the whole file (`bytes=0-`).
> To my knowledge, video stream requests chunks by range and is largely client controlled. It isn't a single, long lived http connection.
Wouldn't a byte range request for the whole file fall under the "single, long lived http connection"? Sure it could be terminated early and another request made for seeking, but regardless the video can start before the whole file is downloaded, assuming it's encoded correctly?
Yes, it would (though a better description would be "a single, long lived http request" because this doesn't have anything to do with connections), and wewewedxfgdf also replied Yes.
> Sure it could be terminated early and another request made for seeking, but regardless the video can start before the whole file is downloaded, assuming it's encoded correctly?
Yes.
It is possible if you are in control of the client, but no browser would stream an mp4 file request by request.
> with an almost always full TCP send buffer at the OS level
This shouldn't be a problem because there is flow control. Also the data would probably be sent to the kernel in small chunks, not the whole file at once.
I believe most browsers do it like that, these days: https://developer.mozilla.org/en-US/docs/Web/Media/Guides/Au...
> This shouldn't be a problem because there is flow control.
It's leveraging flow control, but as I mentioned this might be less efficient (in terms of server memory usage and concurrent open connections, depending on client buffer size and other variables) than downloading larger chunks and closing the HTTP connection in between them.
Many wireless protocols also prefer large, infrequent bursts of transmissions over a constant trickle.
Nope. Browsers send a byte range request for the whole file (`0-`), and the correspoding time range grows as the file is being downloaded. If the user decided to seek to a different part of the file, say at byte offset 10_000, the browser would send a second byte range request, this time `10000-` and a second time range would be created (if this part of the file has not already been downloaded). So there is no evidence there that any browser would stream files in small chunks, request by request.
> in terms of server memory usage
It's not less efficient in terms of memory usage because the server wouldn't read more data from the filesystem than it can send with respect to the flow control.
> concurrent open connections
Maybe if you're on HTTP/1, but we live in the age of HTTP/2-3.
> Many wireless protocols also prefer large, infrequent bursts of transmissions over a constant trickle.
AFAIK browsers don't throttle download speed, if that's what you mean.
> AFAIK browsers don't throttle download speed, if that's what you mean.
Yeah, I suppose by implementing a relatively large client-application-side buffer and reading from that in larger chunks rather than as small as the media codec allows, the same outcome can be achieved.
Reading e.g. one MP3 frame at a time from the TCP buffer would effectively throttle the download, limited only by Nagle's Algorithm, but that's probably still much too small to be efficient for radios that prefer to sleep most of the time and then receive large bursts of data.
Just putting a url in my Chromium based browser’s address bar to an mp4 file we have hosted on CloudFlare R2 “just works” (I expect a video tag would be the same), supporting skipping ahead in the video without having to download the whole thing.
Initially skipping ahead didn’t work until I disabled caching on CloudFlare CDN as that breaks the “accept-range” capability on videos. For now we have negligible amount of viewership of these mp4s, but if it becomes an issue we’ll use CloudFlare’s video serving product.
No. When you play a file in the browser with a video tag. It requests the file. It doesn’t ask for a range. It does use the range if you seek it, or you write the JavaScript to fetch based on a range. That’s why if you press play and pause it buffers the whole video. Only if you write the code yourself can you partially buffer a while like YouTube does.
> That’s why if you press play and pause it buffers the whole video.
Browsers don't do that.
I suppose if you watch it from start to finish without seeking it might cache the entire file, but it may alternatively keep a limited amount cached of the video and if you go back to an earlier time it may need to re-request that part.
Your confidence seems very high on something which more than one person has corrected you on now, perhaps you need to reassess the current state of video serving, keeping in mind it does require HTTP servers to allow range requests.
https://www.zeng.dev/post/2023-http-range-and-play-mp4-in-br...
You can also watch it happen - the Chrome developer tools network tab will show you the traffic that goes to and from the web browser to the server and you can see this process in action.
> It’s not true because throwing a video file as a source on video tag has no information about the file being requested until the headers are pushed down.
And yet, if you stick a web server in front of a video and load it in chrome, you’ll see just that happening.
<video controls>
<source src="/video/sample.mp4" type="video/mp4">
Your browser does not support the video tag.
</video>
into a html file, and run it against this pastebin [0], you'll see that chrome (and safari) both do range requests out of the box if the fileis big enough.When you create a video from a device the header is actually at the end of the file. Understandable, it’s where the file pointer was and mp4 allows this so your recording device writes it at the end. You must re-encoded with faststart (puts the moov atom at the start) to make it load reasonably on a webpage though.
Yet formats like WAVE which use a similar "chunked" encoding they just use a fixed length header and use a single seek() to get back to it when finalizing the file. Quicktime and WAVE were released around nearly the same time in the early 90s.
MP2 was so much better I cringe every time I have to deal with MP4 in some context.
MPEG-2 transport streams seem more optimized for a broadcast context, with their small frame structure and everything – as far as I know, framing overhead is at least 2%, and is arguably not needed when delivered over a reliable unicast pipe such as TCP.
Still, being able to essentially chop a single, progressively written MPEG TS file into various chunks via HTTP range requests or very simple file copy operations without having to do more than count bytes, and with self-synchronization if things go wrong, is undoubtedly nicer to work with than MP4 objects. I suppose that's why HLS started out with transport streams and only gained fMP4 support later on.
So much content ended up being delivered this way, but there was a brief moment where we thought multicast UDP would be much more prevalent than it ended up being. In that context it's perfect.
> why HLS started out with transport streams and only gained fMP4 support later on.
Which I actually think was the motivation to add fMP4 to base MP4 in the first place. In any case I think MPEG also did a better job with DASH technically but borked it all up with patents. They were really stupid with that in the early 2010s.
We often forget there are networks other than the Internet. Understandable, since the Internet is most open. The Internet is just an overlay network over ISPs' private networks.
SCTP is used in cellphone networks and the interface between them and legacy POTS networks. And multicast UDP is used to stream TV and/or radio throughout a network or building. If you have a "cable TV" box that plugs into your fiber internet connection, it's probably receiving multicast UDP. The TV/internet company has end-to-end control of this network, so they use QoS to make sure these packets never get dropped. There was a write-up posted on Hacker News once about someone at a hotel discovering a multicast UDP stream of the elevator music.
That's a good point: I suppose it's a big advantage being able to serve the same, unmodified MPEG transport stream from a CDN, as IP multicast over DOCSIS/GPON, and as DVB-C (although I’m not sure that works like that, as DVB usually has multiple programs per transponder/transport stream).
The MOOV atom is how range requests are enabled, but the browser has to find it first. That's why it looks like it's going to download the whole file at first. It doesn't know the offset. Once it reads it, the request will be cancelled and targeted range requests will begin.
It’s usually written to the end since it’s its not a fixed size and it’s a pain for recording and processing tools to rewrite the whole file on completion just to move the header to the start. You should always re-encode to move the header to the start for web though.
It’s something you see too much of online once you know about it but mp4 can absolutely have the header at the start.
It does work for simpler codecs/containers though: Shoutcast/Icecast web radio streams are essentially just endless MP3 downloads, optionally with some non-MP3 metadata interspersed at known intervals.
[1] https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/Ran...
Why not just use SSE? https://developer.mozilla.org/en-US/docs/Web/API/Server-sent...
A lot of times, what people need is a bidirectional connection yet somehow they convince themselves that SSE is better for the job... But they end up with two different types of streams; HTTP for writes and responses and SSE for passively consuming real-time data... Two different stream types with different lifecycles; one connection could fail while the other is fine... There is no way to correctly identify what is the current connection status of the app because there are multiple connections/statuses and data comes from multiple streams... Figuring out how to merge data coming from HTTP responses with data coming in passively from the SSE is messy and you have no control over the order in which the events are triggered across two different connections...
You can't enforce a serial, sequential, ordered flow of data over multiple connections as easily, it gets messy.
With WebSockets, you can easily assign an ID to requests and match it with a response. There are plenty of WebSocket frameworks which allow you to process messages in-order. The reason they work and are simple is because all messages pass over a single connection with a single state. Recovering from lost connections is much more straight forward.
These are tools, not religions.
Websockets have some real downsides if you don't need bidirectional comms.
https://en.wikipedia.org/wiki/Command%E2%80%93query_separati...
Unless you mean on HTTP2? But aren't WS connections also multiplexed over HTTP2 in that case?
It should say "When used over HTTP/1" instead of "When not used over HTTP/2" because nowadays we also have HTTP/3, and browsers barely even use HTTP/1, so I would say it's pretty safe to ignore that warning.
> Unless you mean on HTTP2?
Any version of HTTP that supports multiplexing.
> But aren't WS connections also multiplexed over HTTP2 in that case?
There is RFC 8441 but I don't think it's actually implemented in the browsers.
Found this: https://github.com/mattermost/mattermost/issues/30285
It looks like it's supported in Chrome and Firefox but not in Safari.
The SSE protocol is actually just a long-running stream like I mentioned but with specific formatting for each chunk (id, event, and data fields)
as a side note, eventkit actually exports utilities to support SSE both on client and server. The reason you'd want to use eventkit in either case is because it ships with some extra transformation and observability goodies. https://hntrl.github.io/eventkit/guide/examples/http-streami...
Also I don't see it being much easier here than a few primitives and learning about generator functions if you haven't had experience with them. I appreciate the helper, but the API is pretty reasonable as-is IMO
The only problem is, if you want to customize the request (e.g. send a POST or add a header),you have to use a third-party implementation (e.g. one from Microsoft [1]), but I hope this can be fixed in the standards later.
[1]: https://www.npmjs.com/package/@microsoft/fetch-event-source
I do like the reactive approach (in fact, I’ve reinvented something similar over SSE). I feel a standards-based solution is just ever so slightly more robust/universal.
No server traffic for 100+ sec officially results in a 524, so you could possibly make that keepalive interval longer, but I haven't tested it.
Make sure to have the new style cache rule with Bypass cache selected and absolutely make sure you are using HTTP/2 all the way to the origin.
The 6 connections per browser limit of HTTP/1.1 SSE was painful, and I am pretty sure auto negotiation breaks, often in unexpected ways with a HTTP/1.1 origin.
(it's not SSE in particular, but it demonstrates that you can have a long running stream like SSE)
That said, I like SSE for unidirectional string-encoded events.
WebSockets is a simpler protocol built from the ground up for bidirectional communication. It provides a lot more control over the flow of data as everything passes over a single connection which has a single lifecycle. It makes it a lot easier to manage state and to recover cleanly from a lost connection when you only have one logical connection. It makes it easier to process messages in a specific order and to do serial processing of messages. Having just one connection also greatly simplifies things in terms of authentication and access control.
I considered the possibility of switching the transport to HTTP2 for https://socketcluster.io/ years ago, but it's a fundamentally more complex protocol which adds unnecessary overheads and introduces new security challenges so it wasn't worth it.
No, it was not. The primary goal of HTTP/2 was to get over traditional connection limits through connection multiplexing because browsers treat TCP connections as an extremely scarce resource. Multiplexing massively improves the ability to issue many asynchronous calls, which are very common -- and H2 went on to make the traditional HTTP stack more efficient across the board (i.e. header compression.) Some of the original HTTP/2 demo sites that popped up after Google first supported it in Chrome were of loading many images over HTTP/1 vs HTTP/2, which is very common. In one case of my own (fetching lots of small < 1kb files recursively from S3, outside the browser) HTTP/2 was like a 100x performance boost over HTTP/1 or something.
You're correct Server Push was tacked on and known to be flawed very early on, and it took a while before everyone pulled the plug on it, but people fixated on it because it just seemed really cool, from what I can tell. But it was never the lynchpin of the thing, just a (failed and experimental) boondoggle.
The primary purpose for HTTP2 was to allow multiple simultaneous asynchoronous http calls, which is a massive loading performance boost for most websites. Server push was very much a tacked on afterthought.
Architect: But it must have websockets!
Me: Literally nothing in this POC needs XHR, much less websockets. It's a sequential buy flow with nothing else going on.
Architect: But it has to have websockets, I put them on the slide!
(Ok he didn't say the part about putting it on the slide, but it was pretty obvious that's what happened. Ultimately I caved of course and gave him completely unnecessary websockets.)
I would say:
> Once we have a working MVP without websockets we can talk again to think about using websocket.
Most times, once something is working, they then stop to care, or we have other priorities then.
I've found , if you could type cast those people, they would be a tech architect who only uses "web scale" items. (Relevant link: https://www.youtube.com/watch?v=5GpOfwbFRcs )
Chrome no longer fires Close or Error events when a websocket disconnects (well, at least not when they happen, they get fired about 10 minutes later!). So, your application won't know for 10 minutes that the connection has been severed (unless the internet connection is also lost, but that isn't always the case when a websocket is disconnected).
Here's the chrome bug:
https://issuetracker.google.com/issues/362210027?pli=1
From that bug report it looks like the Chrome bug is less than a year old, but the Chrome bug is originally mentioned here in April 2023 for a similar bug in iOS (the iOS bug has been resolved):
https://stackoverflow.com/questions/75869629/ios-websocket-c...
I kind of suspect Chrome is actually doing this intentionally. I believe they do this so a tab can recover from background sleep without firing a websocket close event. That's helpful in some cases, but it's a disaster in other cases, and it doesn't matter either way... it breaks the specification for how websockets are expected to work. WebSockets should always fire Close and Error events immediately when they occur.
This one is pretty simple and pretty great: https://github.com/lukeed/sockette
I did my own which provides rpc functionality and type safety: https://github.com/samal-rasmussen/smolrpc
https://issuetracker.google.com/issues/362210027?pli=1
You can add a recurring ping/pong between the client/server so you can know with some recency that the connection has been lost. You shouldn't have to do that, but you probably want to until this bug is fixed.
We've got multiple internal apps using WebSockets in production, for years. I have to say I don't really get all the concern in the article about upgrading the connection - any decent backend framework should handle this for you without a problem.
Hacker News articles on new libraries generally live in the 1% of the 1%. For lots of websites, they don't need a web-socket because they are just doing CRUD. For the 1% doing live updates, web-sockets are great and straight-forward. For whatever specialised use case the article has, sure there's something even less well supported you can pivot to.
Web applications were created because people were averse to creating native applications, for fear of the pain involved with creating and distributing native applications. They were so averse to this perceived pain that they've done incredibly complex, even bizarre things, just so they don't have to leave the web browser. WebSockets are one of those things: taking a stateless client-server protocol (HTTP) and literally forcing it to turn into an entirely new protocol (WebSockets) just so people could continue to do things in a web browser that would have been easy in a native application (bidirectional stateful sockets, aka a tcp connection).
I suppose this is a normal human thing. Like how we created cars to essentially have a horseless buggy. Then we created paved roads to make that work easier. Then we built cities around paved roads to keep using the cars. Then we built air-scrubbers into the cars and changed the fuel formula when we realized we were poisoning everyone. Then we built electric cars (again!) to try to keep using the cars without all the internal combustion issues. Then we built self-driving cars because it would be easier than expanding regional or national public transportation.
We keep doing the easy thing, to avoid the thing we know we should be doing. And avoiding it just becomes a bigger pain in the ass.
Thus we see the flaws in the world, and shrug. When someone else does this, we get angry, and indignant. How dare someone leave things like this! Yet when we do it, we don't make a peep.
I can't tell why you think WebSockets are so bizarre.
Web apps are just way easier to do anything (rarely good), so many people are doing them without real engineering or algo knowledge producing trash every day. Article is also using same voice. Showing one protocol as completely bad, mentioning only the issues both approaches have, but silently omitting those issues describing „the only way, craft, holistic, Rust and WASM based solution, without a plug”
On iOS web apps get suspended very aggressively, and there is no way for a web app to signal to the browser to not suspend it. I never developed native mobile apps, but I assume it’s less aggressive for native apps and/or native apps have a way to prevent themselves from being suspended. This doesn’t seem to be an issue on desktop though.
Which is not "easy" to do over the internet, so the native app folks ended-up using HTTP anyway. (Plus they invented things like SOAP.)
There are no nuances to understand. It’s as simple as fire and forget.
The only downside to WebSockets is that they are session oriented. Conversely, compared to WebSockets the only upside to HTTP is that its sessionless.
The author throws away their own suggestion but it clearly works, works well, and scales well into "supermassive" size. They don't even mention the real downside to web sockets which is that they're stateful and necessarily tied to a particular server which makes them not mesh at all with your stateless share-nothing http servers.
Websockets sound great on paper. But, operationally they are a nightmare. I have had the misfortune of having to use them at scale (the author of Datastar had a similar experience). To list some of the challenges:
- firewalls and proxies, blocked ports
- unlimited connections non multiplexed (so bugs lead to ddos)
- load balancing nightmare
- no compression.
- no automatic handling of disconnect/reconnect.
- no cross site hijacking protection
- Worse tooling (you can inspect SSE in the browser).
- Nukes mobile battery because it hammers the duplex antenna.
You can fix some of these problems with websockets, but these fixes mostly boil down to sending more data... to send more data... to get you back to your own implementation of HTTP.
SSE on the other hand, by virtue of being regular HTTP, work out of the box with, headers, multiplexing, compression, disconnect/reconnect handling, h2/h3, etc.
If SSE is not performant enough for you then you should probably be rolling your own protocol on UDP rather than using websockets. Or wait until WebTransport is supported in Safari (any day now ).
Here's the article with a real time multiplayer Game of Life that's using SSE and compression for multiplayer.
https://example.andersmurphy.com
It's doing a lot of other dumb stuff explained a bit more here, but the point is you really really don't need websockets (and operationally you really don't want them):
https://andersmurphy.com/2025/04/07/clojure-realtime-collabo...
- What makes load balancing easier with SSE? I imagine that balancing reconnects would work similar to WS.
- Compression might be a disadvantage for binary data, which WS specializes in.
- Browser inspection of SSE does sound amazing.
- Mobile duplex antenna is way outside my wheelhouse, sounds interesting.
Can you see any situation in which websockets would be advantageous? I know that SSE has some gotchas itself, such as limited connections (6) per browser. I also wonder about the nature of memory and CPU usage for serving many clients on WS vs SSE.
I have a browser game (few players) using vanilla WS.
- Load balancing is easier because your connection is stateless. You don't have to connect to the same server when you reconnect. Your up traffic doesn't have to go to the same server as your down traffic. Websocket tend to come with a lot of connection context. With SSE you can easily kill nodes, and clients will reconnect to other nodes automatically.
- The compression is entirely optional. So when you don't need it don't use it. What's great about it though is it's built into the browser so you're not having to ship it to the client first.
- The connection limit of 6 is only applies to http1.1 not http2/3. If you are using SSE you'll want http2/3. But, generally you want http2/3 from your proxy/server to the browser anyway as it has a lot of performance/latency benefits (you'll want it for multiplexing your connection anyway).
- In my experience CPU/memory usage is lower than websockets. Obviously, some languages make them more ergonomic to use virtual/green threads (go, java, clojure). But, a decent async implementation can scale well too.
Honestly, and this is just an opinion, no I can't see when I would ever want to use websockets. Their reconnect mechanisms are just not reliable enough and their operational complexity isn't worth it. For me at least it's SSE or a proper gaming net code protocol over UDP. If your browser game works with websockets it will work with SSE.
In my research I recall some potential tradeoffs with SSE [1], but even there I concluded they were minor enough to consider SSE vs WS a wash[2] even for my uses. Looking back at my bookmarks, I see that you were present in the threads I was reading, how cool. A couple WS advantages I am now recalling:
SSE is one-way, so for situations with lots of client-sent data, a second connection will have to be opened (with overhead). I think this came up for me since if a player is sending many events per second, you end up needing WS. I guess you're saying to use UDP, which makes sense, but has its own downsides (firewalls, WebRTC, WebTransport not ready).
Compression in SSE would be negotiated during the initial connection, I have to assume, so it wouldn't be possible to switch modes or mix in pre-compressed binary data without reconnecting or base64-ing binary. (My game sends a mix of custom binary data, JSON, and gzipped data which the browser can decompress natively.)
Edit: Another thing I'm remembering now is order of events. Because WS is a single connection and data stream, it avoids network related race conditions; data is sent and received in the programmatically defined sequence.
0: https://news.ycombinator.com/item?id=43657717
1: https://rxdb.info/articles/websockets-sse-polling-webrtc-web...
With http2/3 the it's all multiplexed over the same connection, and as far as your server is concerned that up request/connection is very short lived.
Yeah mixed formats for compression is probably a use case (like you said once you commit with compression with SSE there's no switching during the connection). But, then you still need to configure compression yourself with websockets. The main compression advantage of SSE is it's not per message it's for the whole stream. The implementations of compression with websockets I've seen have mostly been per message compression which is much less of a win (I'd get around 6:1, maybe 10:1 with the game example not 200:1, and pay a much higher server/client CPU cost).
Websockets have similar issues with firewalls and TCP. So in my mind if I'm already dealing with that I might as well go UDP.
As for ordering, that's part of the problem that makes websockets messy (with reconnects etc). I prefer to build resilience into the system, so in the case of that demo I shared, if you disconnect/reconnect lose your connection you automatically get the latest view (there's no play back of events that needs to happen). SSE will automatically send up the last received event id up on reconnect (so you can play back missed events if you want, not my thing personally). I mainly use event ID as a hash of content, if the hash is the same don't send any data the client already has the latest state.
By design, the way I build things with CQRS. Up events never have to be ordered with down events. Think about a game loop, my down events are basically a render loop. They just return the latest state of the view.
If you want to order up events (rarely necessary). I can batch on the client to preserver order. I can use client time stamp/hash of the last event (if you want to get fancy), and the server orders and batches those events in sync with the loop, i.e everything you got in the last X time (like blockchains/trading systems). This is only for per client based ordering, no distributed client ordering otherwise you get into lamport clocks etc.
I've been burnt too many times by thinking websockets will solve the network/race conditions for me (and then failing spectacularly), so I'd rather build the system to handle disconnects rather than rely on ordering guarantees that sometimes break.
Again, though my experience has made me biased. This is just my take.
Many of the other issues mentioned are also trivial to solve (reconnects, cross-origin protection).
Also, doesn't WebTransport have many of the same issues? (e.g. with proxies and firewalls). And do you have any data for the mobile battery claim? (assuming this is for an application in foreground with the screen on)
Unfortunately, I can't go into much detail on the mobile battery stuff, but I can give you some hints. If you do some reading on how antenna on phones work combined with websockets heartbeat ping/pong and you should get the idea.
The implication is that the ping/pong keeps the system active when it wouldn't otherwise be necessary, but else are you receiving data or detecting lost connection with the other mechanisms? The lower layers have their own keepalives, so what's different?
I looked into it a little since it didn't make sense to me, unless you're comparing apples and oranges, but the only research I could find either didn't seem to support your stance or compared WebSockets to the alternative of just simply not being able to receive data in a timely manner.
Server
const LONG_POLL_SERVER_TIMEOUT = 8_000
function longPollHandler(req, response) {
// e.g. client can be out of sync if the browser tab was hidden while a new event was triggered
const clientIsOutOfSync = parseInt(req.headers.last_received_event, 10) !== myEvents.count
if (clientIsOutOfSync) {
sendJSON(response, myEvents.count)
return
}
function onMyEvent() {
myEvents.unsubscribe(onMyEvent)
sendJSON(response, myEvents.count)
}
response.setTimeout(LONG_POLL_SERVER_TIMEOUT, onMyEvent)
req.on('error', () => {
myEvents.unsubscribe(onMyEvent)
response.destroy()
})
myEvents.subscribe(onMyEvent)
}
Client (polls when tab is visible) pollMyEvents()
document.addEventListener('visibilitychange', () => {
if (!document.hidden)
pollMyEvents()
})
pollMyEvents.isPolling = false
pollMyEvents.oldCount = 0
async function pollMyEvents() {
if (pollMyEvents.isPolling || document.hidden)
return
try {
pollMyEvents.isPolling = true
const response = await fetch('/api/my-events', {
signal: AbortSignal.timeout(LONG_POLL_SERVER_TIMEOUT + 1000),
headers: { last_received_event: pollMyEvents.oldCount }
})
if (response.ok) {
const nMyEvents = await response.json()
if (pollMyEvents.oldCount !== nMyEvents) { // because it could be < or >
pollMyEvents.oldCount = nMyEvents
setUIState('eventsCount', nMyEvents)
}
pollMyEvents.isPolling = false
pollMyEvents()
}
else
throw response.status
}
catch (_) {
pollMyEvents.isPolling = false
setTimeout(pollMyEvents, 5000)
}
}
Working example at Mockaton:
https://github.com/ericfortis/mockaton/blob/6b7f8eb5fe9d3baf...Doing so is a protocol decision though, isn't it?
If the protocol specifies that the server either clearly identifies responses as such, or only ever sends responses, and further doesn't send responses out of order, I don't see any difference to pipelined HTTP: The client just has to count, nothing more. (Then again, if that's the use case, long-lived HTTP connections would do the trick just as well.)
Sounds very tricky to me to get right even at scale.
As for the tendency described, this seems to be an instance of the law of the instrument [2], combined with some instruments being more trendy than others. Which comes up all the time, but raising awareness of more tools should indeed be useful.
> You have to manage the socket lifecycle
You have to do the very same thing with HTTP keep-alive or use a separate socket for each and every HTTP request, which is much slower. Fortunately the browser makes this stupid simple in regards to WebSockets with only a few well named events.
> When a new WebSocket connection is initiated, your server has to handle the HTTP “upgrade” request handshake.
If the author cannot split a tiny string on CRLF sequences they likely shouldn't be programming and absolutely shouldn't be writing an article about transmission. There is only 1 line of data you really need from that handshake request: Sec-WebSocket-Key.
Despite the upgrade header in the handshake the handshake is not actually HTTP. According to RFC6455 it is a tiny bit of text conforming to the syntax of RFC2616, which is basically just: lines separated by CRLF, terminated by two CRLFs, and headers separated from values with a colon. Really its just RFC822 according to RFC2616.
This is not challenging.
I take it this article is written by a JavaScript framework junkie that cannot program, because there is so much in the article that is just wrong.
EDITED: because people get sad.
What the author means with "transactional" is that WebSockets have no built-in request-response mechanism, where you can tell which response belongs to which request. It's a weird word choice, but alas.
I do agree that the bit about "handshakes are hard" feels a bit ill-advised btw, but it's not the core argument nor the core idea of this post. The core idea is "do request-response via HTTP, and then use some sort of single-direction stream (maybe over WS, doesn't matter) to keep client state in sync". This is a pretty good idea regardless of how well or how badly you know the WebSocket RFCs by heart.
(I say this as someone who built a request-response protocol on top of websockets and finds it to work pretty well)
Its not HTTP and does not want to be HTTP. In WebSockets the request/response mechanism is for one side to send a message and then the other side to send a message. If you want to associate a message from one side with a message from the other side put a unique identifier in the messages.
If you really want the request/response round trip then don't use WebSockets. I would rather messages just transmit as each side is ready, completely irrespective of any round trip or response, because then everything is fully event oriented and free from directionality.
Yes! That's the whole point of the article! You agree with the author!
Too many technology fads make things needlessly complicated, and complexity makes systems unreliable.
You might not need Kubernetes
You might not need The Cloud
You might not need more than SQLite
...and so on.
You can still back up your SQLite database file. You shouldn't do it in the middle of a write, or you should use the SQLite backup API to manage concurrency for you, or you can back it up in SQL dump format. This isn't one of the usual reasons you shouldn't use SQLite. If you need synchronous replication, then you shouldn't use SQLite.
SQLite is robust against process crashes and even operating system crashes if fsync works as it should (big if, if your data is important), but not against disk failure.
In most of the cases when you shouldn't use SQLite, you should still just upgrade one step to Postgres, not some random NoSQL thing or Google-scale thing.
https://litestream.io https://github.com/benbjohnson/litestream
cp data.db <backuo location>
On modern cloud systems you shouldn’t have data loss anywayAfter futzing with silly things like file transfers and communication protocols I chucked it out and rewrote it so the client does HTTP long polling of the server and uploads its renders via hTTP POST.
So much easier.
Did you try using an established library like socket.io, connectRPC etc? They handle a lot of the complexity.
I was asking since Socket.io, for example, takes care of file uploads, reconnection, the whole HTTP upgrade flow, and is extremely easy to use, both on client and server. On top of that it can fall back to long-polling if WS is not available.
Here's a link for educational purposes: https://en.wikipedia.org/wiki/Comet_(programming)
And, that is why we have frameworks to at least in the case of Web Sockets, make things as easy as regular old REST.
- "messages aren’t transactional": You can process request and return a value to sender in socket.io application layer. Is that transactional enough?
- "If you’re sending messages that don’t necessarily need to be acknowledged (like a heartbeat or keyboard inputs), then Websockets make a great fit". But socket.io has acknowledgements.
- "When a new WebSocket connection is initiated, your server has to handle the HTTP “upgrade” request handshake.". You can bypass handshake and go straight to WS even in Websockets, and if you don't socket.io handles upgrade for you pretty nicely so you not parsing HTTP header ..
Websockets are a web standard, socket.io is a userland framework
I see the shiny thing and I'm not delusional enough to think I need it.