That is, if your web service struggles to handle single-digit millions of requests per day, not counting static "assets", CGI process startup is not the bottleneck.
A few years ago I would have said, "and of course it's boring technology that's been supported in the Python standard library forever," but apparently the remaining Python maintainers are the ones who think that code stability and backwards compatibility with boring technology are actively harmful things, so they've been removing modules from the standard library if they are too boring and stable. I swear I am not making this up. The cgi module is removed in 3.13.
I'm still in the habit of using Python for prototyping, since I've been using it daily for most of the past 25 years, but now I regret that. I'm kind of torn between JS and Lua.
Amusingly that links to https://peps.python.org/pep-0206/ from 14th July 2000 (25 years ago!) which, even back then, described the cgi package as "designed poorly and are now near-impossible to fix".
Looks like the https://github.com/jackrosenthal/legacy-cgi package provides a drop-in replacement for the standard library module.
There are certainly some suboptimal design choices in the cgi module's calling interface, things you did a much better job of in Django, but what made them "near-impossible to fix" was that at the time everyone reading and writing PEPs considered backwards compatibility to be not a bad thing, or even a mildly good thing, but an essential thing that was worth putting up with pain for. Fixing a badly designed interface is easy if you know what it should look like and aren't constrained by backwards compatibility.
As a side note, the most popular databases are getting only a tiny fraction of the available performance on current hardware. I wrote a couple of comments with more details about this a week ago: https://news.ycombinator.com/item?id=44408654
In the manycore world, Python's GIL makes some approaches to scaling across cores unavailable, though that is changing. But I don't think those are usually relevant to web server throughput, just (potentially) latency.
I haven't tried it yet but I do wonder about the feasibility of writing code in Python and then having an LLM transcode it to something like C, especially since I know C well enough to do what I want in that directly so I could check the resultant code by hand.
I've had much better luck with LLMs translating code from one language to another than with writing it from scratch.
I have bash CGI scripts too, though Shellshock and bash's general bug-proneness make me doubt that this was wise.
There are some advantages of having the CGI protocol implemented in a library. There are common input-handling bugs the library can avoid, it means that simple CGI programs can be really simple, and it lets you switch away from CGI when desired.
That said, XSS was a huge problem with really simple CGI programs, and an HTML output library that avoids that by default is more important than the input parsing—another thing absent from Python's standard library but done right by Django.
As mentioned elsewhere in the thread, the query parameter parsing is still in the Python standard library, just invoked differently.
That policy, and the heinous character assassination the PSF carried out against Tim Peters, mean I can no longer recommend in good conscience that anyone adopt Python.
But I also understand that the world is not perfect. We all need to prioritize all the time. As they write in the rationale: "The team has limited resources, reduced maintenance cost frees development time for other improvements". And the cgi module is apparently even unmaintained.
I guess a "batteries included" philosophy sooner or later is caught up by reality.
What do you mean by "character assassination" carried out against Tim Peters? Not anything in the linked article I presume?
https://www.theregister.com/2024/08/09/core_python_developer...
https://tim-one.github.io/psf/ban
https://chrismcdonough.substack.com/p/the-shameful-defenestr...
It does make me wonder whether Python is still the best choice for what I use it for, and whether I should be moving to something else.
So the remaining people periodically launch some deprecation PEPs or other bureaucratic things in order to give the appearance of active development.
As for prioritizing, I think the right choice is to deprioritize Python.
No, not everyone. I've been using Python as my primary language since 2000 (that's 1.5.2 days). It has been the least troublesome language that I work with, and I work with (or have worked with) a bunch (shell, perl, python, ruby, lua, tcl, c, objective-c, swift, java, javascript, groovy, go and probably others I'm forgetting).
Even all the complaints about the Python packaging ecosystem over the years... I just don't get it. Like, have you ever tried working with CPAN or Maven or Gradle or even, FFS, Ruby Gems/bundler? The Python virtual environment concept is easy to understand and pip mostly does its job just fine, and these days, uv makes all that even faster and easier.
Anywho, just dropping a contrarian comment here because maybe I'm part of the generally silent majority that is just able to use Python day in and day out to get their job done.
I've used CPAN, Maven, gem, and bundler, so I'm also always a little puzzled when people complain about Python's packaging system. However, I've also used npm, so I can kind of understand it.
Python was great in 02000, but some of the things that made it great then are gone now. Stability was one of those; simplicity another; reasonable performance a third; but the biggest issue is really social rather than technical. And I feel like we have alternatives now that we didn't have then.
I have not yet had major problems with breaking changes, but they do happen more often than the used to and it makes me nervous.
The maintenance burden of Python projects is just so much higher than it has any right to be. The language is neat, but not that neat. I think too often we think of performance as a sort of "tradeoff" for having a bad, unergonomic language, but that's not necessarily true. Plenty of languages have poor performance and are also a pain in the ass. We no longer live in a world where our options are C++ or scripting languages. We have mature environments with fantastic tooling. We have fast compilers with amazing error messages. We have great runtimes with more than adequate performance.
I do think there are some inherent tradeoffs in the space.
My main issue with prototyping as a concept is that it doesn't exist in most workplaces. Prototypes quickly devolve into applications. Discarding code is risky. Your best bet IMO is choosing a language that's ergonomic in the long run, because odds are you're in for the long run.
dotnet is another great choice because of the tooling and batteries included, although you do have to deal with a fairly slow compiler. Java is okay too, but Java is very restrictive and high-friction, which might not lend itself to prototyping.
In the world of scripting languages, ironically PHP is a decent choice. It has better progressive typing than Python and it's reasonably safe these days. We've sort of come full circle on PHP. The downside is that PHP programmers tend to throw everything in an array, especially when going fast. That hurts readability and the IDE a lot.
And then, of course, typescript and node. I don't like typescript. There's something about scripting languages with build steps that pisses me off. But, it's got a wide developer pool and it's not the worst language ever. Although there's a bit too much teeth-pulling IMO with typescript.
You guys are all really getting worked up over very little.
This is a bit like Apple firing Steve Jobs for wearing sneakers to work because it violates some dress code.
Also I used Python way before JS, and I still like JS's syntax better. Especially not using whitespace for scope, which makes even less sense in a scripting language since it's hard to type that into a REPL.
What Node.js had from the start was concurrency via asynchronous IO. And before Node.js was around to be JavaScript's async IO framework, there was a robust async IO framework for Python called Twisted. Node.js was influenced by Twisted[0], and this is particularly evident in the design of its Promise abstraction (which you had to use directly when it was first released, because JavaScript didn't add the async/await keywords until years later).
But that's beside the point. Performant web backends are way easier to deal with in NodeJS than in Python. I'm not comparing to Twisted because, even though it looks good, every Python backend I've ever seen was either plain Python or Django, which was also a mess compared to Express.
Not saying that Python is great, but Node is even worse.
Jupyter fixes the REPL problem, and it's a major advance in REPLs in a number of other ways, but it has real problems of its own.
I do not think JS got it right. Node did, by doing async, but the reason for that was that JS did not do threads! It was making a virtue of a deficiency.
I love whitespace for scope.
JS didn't do threads because threads are an error-prone way to write concurrent software. Crockford was a huge influence on its development in the early 02000s, and he had been at Electric Communities; he was part of the capabilities cabal, centered around Mark S. Miller, who literally wrote his dissertation on why you shouldn't use threads and how to structure async code comprehensibly. Promises came from his work, by way of Twisted. Unfortunately, a lot of that work didn't get into JS until well after Node had defined a lot of its APIs.
But this wasn't "making a virtue of a deficiency". JS was intentionally exploring a different approach to structuring concurrent software. You might argue that that approach didn't pay off, but it wasn't some kind of an accident.
TCL, which promoted the same approach for the same reasons long before Node, eventually added threading.
We clearly need some way to take advantage of manycore, but I'm still not convinced that threading with implicitly shared mutable state is the right default. It isn't even what the hardware implements! Every core has its own cache! It's a better fit to the hardware than a single giant single-threaded event loop is, and I think that accounts for its curent dominance, but there are a lot of other possibilities out there, like transactional memory, explicit asynchronous message-passing interfaces, or (similarly) lots of tiny single-threaded event loops like Erlang (or, maybe, like Web Workers).
Greenthreading like in Golang is even better cause you get the advantages of OS threads, but that requires a runtime, which is why Rust didn't do that. And I guess it's hard to implement, cause Java didn't have it until very recently.
Depends what you are doing, how you are doing it, and how careful you need to be with resources.
The article is about the fact that it is often OK even when done in a particularly inefficient way.
https://github.com/python/cpython/commits/3.12/Lib/cgi.py
Turns out most maintenance this thing received is the various attempts of removing it.
All that was in the cgi module was a few functions for parsing HTML form data.
As a side note, though, CGIHTTPRequestHandler is for launching CGI programs (perhaps written in Rust) from a Python web server, not for writing CGI programs in Python, which is what the cgi module is for. And CGIHTTPRequestHandler is slated for removal in Python 3.15.
The problem is gratuitous changes that break existing code, so you have to debug your code base and fix the new problems introduced by each new Python release. It's usually fairly straightforward and quick, but it means you can't ship the code to someone who has Python installed but doesn't know it (they're dependent on you for continued fixes), and you can't count on being able to run code you wrote yourself on an earlier Python version without a half-hour interruption to fix it. Which may break it on the older Python version.
The support for writing CGI programs in Python is in wsgiref.handlers.CGIHandler .
You can't carry everything to very long term horizons, especially for categories of "everything" whose user base is 2 people and 1 squirrel.
People who want otherwise should volunteer to maintain what they want kept.
{ Oh, and before anyone jumps on me, this is only an analogy as it relates to freshman moral philosophy courses, not an attempt by me to over-dramatize - that is more the fault of said courses trying to engage 18 year olds. :-) I'm mostly interested in the active-passive details of the pledge campaign. }
Secondly, you have to find a reliable maintainer or several.
A lot of people want stuff to be maintained indefinitely for them by unspecified "others".
Not updating the system is usually a solution to such problems.
At best there is a nginx or an API in front that acts a reverse proxy to clean-up/normalize the incoming requests and prevent directly exposing the service.
Example: banks, airlines, hospitals, air traffic controllers, electricity companies, etc
All critical services that nobody wants to touch, as it works +/-
a) make the system air gapped
b) pay a Python consulting company to back port security fixes
c) hire a Python core dev to do the system, directly
OOOOR, they can just update to Python 3.13 and migrate to the equivalent Python package that's not part of the core. For sure they already use other Python packages already.
We're making a mountain out of a molehill, also on behalf of places that have plenty of money to spend if push comes to shove.
Jython no longer works with basically any current Python libraries because it never made the leap to Python 3, and the Python community stigmatizes maintaining Python 2 compatibility in your libraries. This basically killed Jython, and from my point of view, Jython was one of the best things about Java.
Most rational people are ok with code being removed that 99.99% of users have absolutely no use for, especially if it is unmaintained, a burden, or potentially contains security issues. If you are serious about cgi you’ll probably be looking at 3rd party modules anyway.
EDIT: So, you get threads like this https://stackoverflow.com/questions/65651040/what-is-the-rec... and so on
That is very much a self-inflicted wound, though. If you insist on not using the standard packaging solution for the language, you have to own the complications of that.
Python has no standard packaging. They even deprecated and removed distutils (another terrible idea that caused a lot of busywork). The only way that python supports packages is via 3rd party external solutions.
I mean, I understand the desire to remove distutils. It sucked. It was the least Pythonic package in the whole Python standard library. But removing it was even worse, because it means you can't use old versions of most Python libraries with recent versions of Python.
Personally… I don't use pip. Why? apt is there.
Also I'm part of the python team in debian so I can package what's missing or update out of date things if I need.
Lua barely has any stdlib to speak of, most notably in terms of OS interfaces. I'm not even talking about chmod or sockets; there's no setenv or readdir.
You have to install C modules for any of that, which kinda kills it for having a simple language for CGI or scripting.
Don't get me wrong, I love Lua, but you won't get far without scaffolding.
But my concern is mostly not about needing to bring my own batteries; it's about instability of interfaces resulting from evaporating batteries.
LuaJIT, release-wise, has been stuck in a very weird spot for a long time, before officially announcing it's now a "rolling release" - which was making a lot of package maintainers anxious about shipping newer versions.
It also seems like it's going to be forever stuck on the 5.1 revision of the language, while continuing to pick a few cherries from 5.2 and 5.3. It's nice to have a "boring" language, but most distros (certainly Alpine, Debian, NixOS) just ship each release branch between 5.1 and 5.4 anyway. No "whatever was master 3 years ago" lottery.
That said these days I'd rather use Go.
Admittedly Python is not great at this either (reload has interacted buggily with isinstance since the beginning), but it does attempt it.
I agree its not a rapid prototyping kind of language. AI assistance can help though.
Your app could add almost no latency beyond storage if you try.
Since I learnt Python starting in version 1.6, it has mostly been for OS scripting stuff.
Too many hard learnt lessons with using Tcl in Apache and IIS modules, continuously rewriting modules in C, back in 1999 - 2003.
edit: Looks like yes for Node JS. I can't tell for PHP as I keep getting results for optcache which is different and in memory.
I still miss PHP's simple deployment, execution and parallelization model, in these over-engineered asyncy JavaScripty days.
Also lets see the impact of Microsoft's Python team layoffs on it, given that CPython developers only started caring about performance due to Facebook and Microsoft, so far the JITs in Python have been largely ignored by the community.
At the time Perl was the thing I used in the way I use Python now. I spent a couple of years after that working on a mod_perl codebase using an in-house ORM. I still occasionally reach for Perl for shell one-liners. So, it's not that I haven't considered it.
Lua is in a sense absolutely stable unless your C compiler changes under it, because projects just bundle whatever version of Lua they use. That's because new versions of Lua don't attempt backwards compatibility at all. But there isn't the kind of public shaming problem that the Python community has where people criticize you for using an old version.
JS is mostly very good at backwards compatibility, retaining compatibility with even very bad ideas like dynamically-typed `with` statements. I don't know if that will continue; browser vendors also seem to think that backwards compatibility with boring technology like FTP is harmful.
- `perl -de 0` provides a REPL. With a readline wrapper, it gives you history and command editing. (I use comint-mode forn this, but there are other alternatives.)
- syscalls can automatically raise exceptions if you `use autodie`.
Why is this not the default? Because Perl maintainers value backward compatible. Improvements will always sit behind a line of config, preventing your scripts from breaking if you accidentally rely on functionality that later turns out to be a mistake.
Perl feels clumsy and bug-prone to me these days. I do miss things like autovivification from time to time, but it's definitely bug-prone, and there are a lot of DWIM features in Perl that usually do the wrong thing, and then I waste time debugging a bug that would have been automatically detected in Python. If the default Python traceback doesn't make the problem obvious, I use cgitb.enable(format='text') to get a verbose stack dump, which does. cgitb is being removed from the Python standard library, though, because the maintainers don't know it can do that.
Three years ago, a friend told me that a Perl CGI script I wrote last millennium was broken: http://canonical.org/~kragen/sw/rfc-index.cgi. I hadn't looked at the code in, I think, 20 years. I forget what the problem was, but in half an hour I fixed it and updated its parser to be able to use the updated format IETF uses for its source file. I was surprised that it was so easy, because I was worse at writing maintainable code then.
Maybe we could do a better job of designing a prototyping language today than Larry did in 01994, though? We have an additional 31 years of experience with Perl, Python, JS, Lua, Java, C#, R, Excel, Haskell, OCaml, TensorFlow, Tcl, Groovy, and HTML to draw lessons from.
One benefit Perl had that I think not many of the other languages do was being designed by a linguist. That makes it different -- hard to understand at first glance -- but also unusually suitable for prototyping.
Basically something like "use strict" in JS.
For now, the relevant committees think some more experimentation and deprecation needs to happen before locking in the set of features to be considered modern.
Come to think of it, that's a nice extensibility scheme, too - whenever you want to update it, just add another "very". ~
Only one benchmark on one system, but over in day before yesterday's HN thread on this (https://news.ycombinator.com/item?id=44464272), I report a rather significant slowdown in Perl start up overhead: https://news.ycombinator.com/item?id=44467268 . Of course, at least for me, Python3 is worse than Python2 by an even larger factor and Python2 worse than Perl today by an even larger factor.
FWIW, in Nim, you can get a CGI that probably runs faster than the Go of this article with simply:
import std/cgi # By default both ..
for (key, val) in decodeData(): #.. $QUERY_STRING & POST
if key == "something":
do_something(val)
I don't know of a `cgitb` equivalent even in the Nimbleverse. Some of the many web frameworks in Nim like jester seem to have that kind of thing built into them, though I realize a framework is not the same as CGI (and that's one of the charms of CGI).Python has Werkzeug, Flask, or at the heavier end Django. With Werkzeug, you can translate your CGI business logic one small step at a time - it's pretty close to speaking raw HTTP, but has optional components like a router or debugger.
then that endpoint will have at least 400ms response times, not great
If I remember correctly that is about half of what StackExchange served on daily average over 8 servers. I am sure using Go or Crystal would have scale this at least 10x if not 20x.
The problem I see is that memory cost isn't dropping which means somewhere along the graph the memory cost per process together will outweight whatever advantage this has.
Still, sounds like a fun thing to do. At least for those of us who lived through CGI-Bin and Perl era.
I agree that tens of milliseconds of latency is significant to the user experience, but it's not always the single most important consideration. My ping time to news.ycombinator.com is 162–164ms because I'm in Argentina, and I do unfortunately regularly have the experience of web page loads taking 10 seconds or more because of client-side JS.
You can measure this easily, get a copy of Windows busybox and write a shell script that forks off a process a few thousand times. The performance difference is stark.
Are you really saying that it takes 70ms on Microsoft Windows? I don't have an installed copy here to test.
Even if it does, that would still be about 15% of the time required for `python3 -m cgi`, so it seems unlikely to be an overriding concern for CGI programs written in Python, at least on manycore servers serving less than tens of millions of hits per day. Or does it also fail to scale across cores?
$ cat winforktest.sh
#!/home/busybox-1.35 sh
i=1; while [ $i -lt 10000 ]; do (echo $i); i=$((i+1)); done
On Linux, this takes: $ cat /etc/redhat-release
Red Hat Enterprise Linux release 8.10 (Ootpa)
$ time ./winforktest.sh > /dev/null
real 0m0.792s
user 0m0.490s
sys 0m0.361s
On Windows, c:\users\x>ver
Microsoft Windows [Version 10.0.22631.5472]
c:\users\x>busybox | busybox head -2
BusyBox v1.37.0-FRP-5236-g7dff7f376 (2023-12-06 10:31:32 GMT)
(mingw64-gcc 13.2.1-5.fc39; mingw64-crt 11.0.0-2.fc39; glob)
c:\users\x>busybox sh
~ $ time ./winforktest > /dev/null
real 3m 44.44s
user 0m 0.32s
sys 0m 4.90s
Windows is quite a bit slower, at least with Busybox.I understand that WSLv1 presents a more efficient fork(), which is not really available to shims like Cygwin or mingw.
So, that does 19999 fork()s in 224 seconds, which works out to about 11 milliseconds per fork(). (On my Linux box, it's actually doing clone(), and also does a bunch of wait4(), rt_sigaction(), exit_group(), etc., but let's assume the fork() is the bottleneck.)
This is pretty slow, but it's still about 6× faster than the 70 milliseconds I had inferred from your "100× slower than any other POSIX".
Also note that your Red Hat machine is evidently forking in 39μs, which is several times faster than I've ever seen Linux fork.
I agree that the number I gave is erroneous, but not in the direction you imply. 400ms is extremely conservative, and it's not just starting an executable. I took my 100-millisecond estimate for the time to start up Python and `import cgi`, which loads 126 modules in the version I have here on my cellphone, and multiplied it by a safety factor of 4. Even on my cellphone it doesn't take that long, although `python3 -m cgi` does, because it loads all those modules and then runs a debug dump script. The time I measured for `python3 -m cgi` on my laptop is 90–150ms.
If all you are doing is starting a per-request executable, that will typically take more like 1 millisecond than 100 milliseconds.
Perhaps you meant to suggest that the actual logic triggered by the request would increase this 400ms number significantly.
Consider first the case where it would, for example because you do 200ms of computation in response to each request. In this case the best you could do by switching from CGI to a more efficient mechanism like FastCGI is a 3× increase in req/s. If that allows you to increase from 10 million requests per day on one server to 30 million, that could be worthwhile if you actually have tens of millions of requests per day. But it's kind of a marginal improvement; it saves you from needing two or three servers, probably.
Now consider the case—far more common, I think—where your computation per request is small compared to our hypothetical 400 ms Python CGI overhead. Maybe you do 20ms of computation: 80 million machine instructions or so, enough to tie an IBM PC up for a few minutes. In such a case, the total one-core request time has gone from 400ms to 420ms, so 400ms is a very good estimate, and your criticism is invalid.
Even on HN#1, you get like two requests per second for maybe 18 hours. That's three orders of magnitude below 200M/day. Shifting the argument to say that no hobby project needs this is easy. PHP being in common use already proves that starting up and tearing down all context is fine for >99% of websites. But the context we're in is an article saying one can do 200M requests per day with CGI-bin, not whether 99% of sites need that. CGI-bin is simply wasteful if the process takes 400ms before it starts doing useful work, not to mention a noticeable amount of lag for users. (Thankfully TFA specifies they're not doing that but are speaking of compiled languages)
> your computation per request is [commonly] small compared to our hypothetical 400 ms Python CGI overhead. Maybe [that makes it go] from 400ms to 420ms
At "only" 420ms total request time, I'd still want to take that prototype code out of this situation. Might be as easy as wrapping the code so that the former process entrypoint (__main__ in python) gets, instead, called by some webserver within the same process (Flask as the first python example that comes to mind). Going from 420ms request handling time to slightly over 20ms seems rather worth an hour of development time to me if you have this sort of traffic volume
...and then you're wasting a 64-core server at 100% CPU load on just starting up and tearing down script instances, not yet doing any useful work. This is doing 160 startups per second, not requests per second
Would be a pretty big program for it to require 400ms on a fast CPU though, but the python interpreter is big and if you have one slow import it's probably already not far off
why lua?
Shortly after that, we had a lengthy thread about "how Lua compares to Python": https://news.ycombinator.com/item?id=42655158
Two years ago I commented https://news.ycombinator.com/item?id=38862372 listing Lua's worst flaws from my point of view.
I’m amazed at how great the tools are these days that are free and yet we pay so much to cloud providers. I know it’s not an apples to apples comparison but it was so great to develop all that and fine tune it on a box in my basement.
It's crazy to me that we keep paying all these overheads for no reason other than "it's what Google does". I really need to write my article about my modular monolith architecture up, it's worked really well for us.
That said, to be honest, I don’t find Kubernetes complex — but that might be because I’ve been using it for quite some time.
I fought really hard at one of the startups I previously worked. For the CV/Resume polishing, engineers pushed for K8s. It was outrageous for me since there were total of 3 applications running. 1 Python monolith and 2 other smaller Go apps.
Just before that, I was able to reduce AWS costs from 5k+/eur/month to sub 2k/eur/month. Having 3 parallel environments running at the time. (Dev/Staging/Prod).
Not if you are using Next.JS. It’s bizarre that even static files can only do like 10r/s on Next.JS
Cloud providers are used because there is a lot of vested interest - e.g. VC and investors also having shares in cloud companies, then there is fear their investment might not survive imaginary surge of traffic that will never happen in reality. Cloud sales people are masters of playing at investors insecurities.
In my experience, cloud at scale has ALWAYS required someone with a pager willing/paid to get up at 3am on Christmas Eve. So someone’s time is being used no matter.
Power draw is low so it's on a basic consumer UPS which has worked fine for the short power outages we've had (minutes).
RAID means that a drive dying doesn't impact service, although I've yet to have a drive outright die, I've just had them start developing bad sectors and preemptively replaced them (btrfs checksumming keeps the data safe)
I have backups going to a cloud storage service, so if the crap hits the bucket when I'm traveling, I can just spin up a VPS and restore to there (which is what I would have to do if a cloud server died anyway)
You realise you can always still restore that backup onto someone else's server? When you need to restore from backup either way. I don't really see why one would pre-emptively pay for it
> If I accidentally lock myself out, there's no serial terminal to fall back on.
Why not? That sounds like a choice you can make. The hardware I hosted on either had a KVM built in or can just attach a USB keyboard and VGA (or nowadays HDMI) display
Power outages aren't common in my area, and otherwise a UPS is not that expensive (compared to if you pay a third party to set up redundant power for your hobby system)
You can choose to pre-emptively pay the cloud premium and give them access to your server so you can also social engineer yourself back in via customer support (after all, if you aren't expecting to lose the password and thus don't need to convince a human to let you into your hosting account, then you could also hold onto your own server's password). It just all seems very opposed from the self-hosting spirit where you're self-reliant, which apparently you value since you were considering whether to self host?
Huh? If a drive goes out in my home server, the entire thing is offline until amazon delivers a new one (don't tell me I now need to keep a stockpile of spares). If a drive goes out in S3, I never know about it because AWS takes care of it. You don't understand why someone would want to "preemptively" pay for that?
> It just all seems very opposed from the self-hosting spirit where you're self-reliant, which apparently you value since you were considering whether to self host?
My self hosting is not ideological at all. I couldn't care less about "self reliance." The reason I considered hosting locally was to save money, and I concluded I was actually getting a lot of value for the money I was spending on AWS.
Can you imagine the size of the business / service you could run with 4 attached 20TB drives, and a modest CPU? Good luck getting such from a cloud provider.
Well ackshually ... the technology here that was important was mod_php; PHP itself was no different to Perl in how it was run, but the design choice of mod_php as compared to mod_perl was why PHP scripts could just be dumped on the server and run fast, where you needed a small amount of thinking and magic to mod_perl working.
What almost brought us to tears the day we learned about PHP was how everything we had been painstakingly programming ourselves from scratch reading RFCs or reverse engineering HTTP was just a simple function call in PHP. No more debugging our scuffed urlencode implementation or losing a day to a stray carriage return in an HTTP header...
mod_perl2[0] provides the ability to incorporate Perl logic within Apache httpd, if not other web servers. I believe this is functionally equivalent to the cited PHP Apache module documentation:
Running PHP/FI as an Apache module is the most efficient
way of using the package. Running it as a module means that
the PHP/FI functionality is combined with the Apache
server's functionality in a single program.
0 - https://perl.apache.org/docs/2.0/index.htmlEDIT: I have managed to dig out slides from a talk I gave about this a million years ago with a good section that walks through history of how all this worked, CGIs, mod_perl, PSGI etc, for anyone who wants a brief history lesson: https://www.slideshare.net/slideshow/psgi-and-plack-from-fir...
I got into web dev in the tail end of perl and cgi-bin. I remember my first couple scripts which were just copy/paste from tutorials and what not, everyone knows how it goes. It was very magical to me how this "cgi-bin" worked. There was a "script kiddy hacking tool" I think named subseven (or similar) written partially in perl that you would trick your friends into running or you'd upload on filesharing. The perl part gave you your web based C&C to mess with people or open chats or whatever. I really got into programming trying to figure out how this all worked. I soon switched over to PHP and in my inexperience never realized the deployment model was so similar.
I do think this model of running the script once per request and then exiting really messed with my internal mental model of how programs and scripts worked. Once I was exposed to long running programs that could maintain state, their own internal data structures, handling individual requests in a loop, etc, was a real shock and took me awhile to conceptualize.
It’s strange thinking back to the days where persisting information as simple as a view counter required persisting data to a flatfile* or something involving a database.
These days with node and our modern languages like go and rust it’s immediately obvious how it’s done.
I think it’s both a mix of me learning and growing and the industry evolving and growing, which I think all of us experience over time.
* for years using flat files was viewed as bad practice or amateurish. fun to learn years later that is how many databases work.
> These days with node and our modern languages like go and rust it’s immediately obvious how it’s done.
Okay I'll bite. How is it done now and why is the new way better than using a DB?
I do remember being exposed to the arcane Perl syntax for the first time so it must have been a different program.
At least that was the case when I did the "python is single threaded, let's run many of them" + "python is slow, let's run many of them" dance
At scale you end up using shared connection pools outside of python (like pgbouncer) and a lot of tuning to make it serve the load while not killing the database
Of course, then we reimplemented in a multithreaded somewhat performant language and it became dead simple again
And we trading performance for what exactly? Code certainly didn't become any simpler.
Which is only a small proportion of sites out there.
I think the worst drama ever was a partial disk failure. Things kinda hobbled along for awhile before things actually started failing, and at that point things were getting corrupted. That poofed a weekend out of my life. Now I have better monitoring and alerting.
So you really do not have to be bothered by installation or anything of these lines. You install once and you are fine. You should check out the Wiki pages of Arch Linux, for example. It is pretty straightforward. As for upgrades, Arch Linux NEVER broke. Not on my servers, and not on my desktop.
That said, to each their own.
I will give you the benefit of the doubt that you are not regurgitating what other people have been saying (IMO wrongfully), which is: "Arch Linux for servers? Eww. Bleeding edge. Not suitable for servers.". All that said, please, do share. It will not negate those decades of no issues, however.
As I said, I maintain quite a lot of Arch Linux servers with loads of services without any issues, for decades.
Or if you don't want to pay for an 8/16 for the sort of throughput you can get on a VPS with half a core.
Sure, there are regressions that will make start-up overhead worse, but I mean, there will be pathological regressions in any configuration.
The thing limiting you to that number also isn't just the startup cost. If it was, you could just run more things in paralell. The startup cost kills your minimum latency, but the rps limit comes from some other resource running out; cpu, memory, context switching, waiting for other services that are in themselves limited, etc. If it's cpu, which is very likely for python, any little performance regression can melt down the service.
Life is so much easier if you just get a somewhat performant base to build on. You can get away with being less clever, and you can see mistakes as a tolerable bump in resource usage or response times rather than a fail-whale
In practise I'm not convinced -- but I would love to be. Reverse proxying a library-specific server or fiddling with FastCGI and alternatives always feels unnecessarily difficult to me.
import wsgiref.handlers, flask
app = flask.Flask(__name__)
wsgiref.handlers.CGIHandler().run(app)
The way we run the scripts is with uwsgi and its cgi plugin[1]. I find it simpler and more flexible than running apache or lighttpd just for mod_cgi. Since uwsgi runs as a systemd unit, we also have all of systemd's hardening and sandboxing capabilities at our disposal. Something very convenient in uwsgi's cgi handling that's missing from mod_cgi, is the ability to set the interpreter for a given file type: cgi = /cgi-bin=/webapps/cgi-bin/src
cgi-allowed-ext = .py
cgi-helper = .py=/webapps/cgi-bin/venv/bin/python3 # all dependencies go here
Time to first byte is 250-350ms, which is acceptable for our use case.This let's you drop .htaccess files anywhere and Apache will load them on each request for additional server config. https://httpd.apache.org/docs/2.4/howto/htaccess.html
One big reason to avoid them was performance; it required extra disk access on every request and it was always better to put the configuration in the main config file if possible.
But now? When most servers have an SSD and probably spare RAM that Linux will use to cache the file system?
Ok, performance is still slightly worse as Apache has to parse the config on every request as opposed to once, but again, now that most servers have more powerfull CPU's? In many use cases you can live with that.
[ Side project is very early version but I'm already using it: https://github.com/StaticPatch/StaticPatch/tree/main ]
> I'm not a real programmer. I throw together things until it works then I move on. The real programmers will say "Yeah it works but you're leaking memory everywhere. Perhaps we should fix that." I’ll just restart Apache every 10 requests.
PHP got a very long way since then, but a huge part of that was correcting the early mistakes.
> PHP 8 is significantly better because it contains a lot less of my code.
I do have thoughts for later about modes which could take all the config from .htaccess files and build them into the main config so then you avoid any performance issues - however you have to do that carefully to make sure people don't include any bad config that crashes the whole server. One of the nice things about using .htaccess files as intended is Apache has the Nonfatal flag on AllowOverride so you can avoid that. https://httpd.apache.org/docs/2.4/mod/core.html#allowoverrid...
IMO you don't need to compensate for bad configs if you're using a proper staging environment and push-button deployments (which is good practice regardless of your development model). In prod, you can offset its main issue (atomic deployments) by swapping a symlink. In that scenario, having a separate .htaccess file actually helps - you don't want to restart Apache if you can avoid it, and again - hot reloading can hide state.
My main issue is that this is all a very different model from what most languages, frameworks, and runtimes have been doing for almost 20 years now. If you're a sysop dealing with heterogenous environments, it's honestly just annoying to have separate tooling and processes.
Personally, ca 10 years ago, this was the tipping point at which I've demanded from our PHP devs that they start using Docker - I've been reluctant about it until that moment. And then, whether it was .htaccess or the main config, no longer mattered - Apache lived in a container. When I needed to make sure things performed well, I used Locust <https://locust.io/>. Just measure, then optimise.
So in practice, yes, spiritually I'm doing what PHP8 did to PHP3. Whether that's "approvingly" is up to your interpretation ;)
But 25 years ago it was a significant performance hit. So you might wonder why Apache didn't just watch the filesystem to avoid it. The answer, as best I can reconstruct it, is that only IRIX had a usable filesystem change notification API; Linux had a shitty broken one, and Solaris and FreeBSD didn't have one at all. Linux's inotify is still a pain in the ass, but at least now it's good enough that it can actually work. https://groups.google.com/g/mailing.freebsd.fs/c/T64SiVOfyUE has Mark Felder 10 years ago describing it as "a world of hurt", but not having a viable alternative for FreeBSD even then.
So 25 years ago there was just no way to do this, so people just got in the habit of turning off .htaccess to make Apache fast. That meant that there was no incentive to make .htaccess fast.
Another place this can be useful is for allowing customers to extend a local software with their own custom code. So instead of having to use say MCP to extend your AI tool they can just implement a certain request structure via CGI.
This makes me wonder if an MCP service couldn't be also implemented as CGI: an MCP framework might expose its feature as a program that supports both execution modes. I have to dig into the specs.
Although the VPS is long gone, and I did not use any version control at the time. The laptop I wrote the stuff are gone too :'(. But it was certainly quite fun. Deployment was as easy as Makefile + scp and testing was the another Bash script with bunch of `netcat`s and greps.
What a time to be alive :)
() technically, two of them: one handled the front (customer visible) end, one handled the back-office side.
"A brief, incomplete and largely inaccurate history of dynamic webpages"
https://www.slideshare.net/slideshow/psgi-and-plack-from-fir...
In the mid-2000s I worked on a very-large-scale website using Apache2 w/mod_perl. Our high-traffic peaks were something like 25k RPS (for dynamic content; total RPS was >250k). Even at that time it was a bit old hat, but the design scaled very well. You'd have a fleet of mod_perl servers that would handle dynamic content requests, and a fleet of Apache2 servers that served static content and reverse-proxied back to the mod_perl fleet for dynamic requests. In front of the static servers were load balancers. They'd all keep connection pools open and the load balancers avoided the "maximum connection limit" of typical TCP/IP software, so there was no real connection limit, it was just network, memory, and cpu limits.
The big benefit of Apache2 w/mod_perl or mod_php was that you combined the pluggability and features of a scalable and feature-filled web server with the resident memory and cache of an interpreter that didn't need to keep exiting and starting. Yes you had to do more work to integrate with it, but you have to do that today with any framework.
The big downside was bugs. If you had a bug, you might have to debug both Apache and your application at the same time. There was not as much memory to be had, so memory leaks were a MUCH bigger problem than they are today. We worked around it with stupid fixes like stopping interpreters after taking 1000 requests or something. The high-level programmers (Perl, PHP) didn't really know C or systems programming so they didn't really know how to debug Apache or the larger OS problems, which it turns out has not changed in 20 years...
FastCGI and later systems had the benefit that you could run the same architecture without being tied to a webserver and dealing with its bugs on top of your own. But it also had the downside of (in some cases) not multiplexing connections, and you didn't get tight integration with the web server so that made some things more difficult.
Ultimately every backend web technology is just a rehashing of CGI, in a format incompatible with everything else. There were technical reasons why things like FastCGI, WSGI, etc exist, but today they are unnecessary now that we have HTTP/2 and HTTP/3. If you can multiplex HTTP connections and serve HTTP responses, you don't need anything else. I really hope future devs will stop reinventing the wheel and go back to actual standards that work outside your own single application/language/framework.
The fork[0] system call has been a relatively quick operation for the entirety of its existence. Where latency is introduced is in the canonical use of the execve[1] equivalent in newly created child process.
> ... cgi bin works really well if you don’t have to pay for ssl or tcp connections to databases or other services, but you can maybe run something like istio if you need that.
Istio[2] is specific to Kubernetes and thus unrelated to CGI.
0 - https://man.freebsd.org/cgi/man.cgi?query=fork&apropos=0&sek...
1 - https://man.freebsd.org/cgi/man.cgi?query=execve&sektion=2&a...
I can totally see how the cgi-bin process-per-request model is viable in a lot of places, but when it isn't, the difference can be vast. I don't think we'd have benefited from the easier concurrency either, but that's probably just because it was all golang to begin with.
It's possible that your experience with people switching was later, when performance was no longer such a pressing concern.
These were real issues on multi-user hosts, but as most of the time we don’t use shared hosting like that anymore it’s not an issue.
There were also some problems with libraries parsing the environment variables with the request data wrong, but that’s no different from a badly implemented http stack these days. I vaguely recall some issues with excessively log requests overflowing environment variables, but I can’t remember if that was a security problem or DoS.
Java remains the only programming language I've ever heard covered in a feature story for NPR.
Combined with the dot-com boom "general hype", I'm sure a lot of managers pushed heavyweight solutions where lightweight would have sufficed. Well, that may be an eternal problem, but maybe more succeeded in pushing them with a lot of hype. :-)
Not enough people I guess saw this as Sun trying to be the new Microsoft (which was the new IBM, which still has MVS & Cobol!), namely the company in control of The Platform, where here "The" just means the hip new thing kids learn in school and want to continue doing before they become expensive old timers.
That same go program can easily go over 10k reqs/sec without having to spawn a process for each incoming request.
CGI is insanely slow and insanely insecure.
EDIT: Looks like the way CGI works made it vulnerable to Shellshock in 2014: https://en.m.wikipedia.org/wiki/Shellshock_(software_bug)
I agree that there's probably not much of an argument to switch to it from the well established alternative mechanisms we are using already.
The one thing in its favor is that it makes it easier to have a polyglot web app, with different languages used for different paths. You can get the same thing using a proxy server though.
And a Go program reading from a network connection is immune from the same concerns how?
If only I could borrow such confidence in network data... :-D
From your linked article: If the handler is a Bash script, or if it executes Bash...
But we are talking about Python not Bash.
You could configure the server to be insecure by, eg, allowing cgi execution from a directory where uploaded files are stored.
I hesitate to suggest that you might be misremembering things that happened 30 years ago, but possibly you were using a very nonstandard setup?
For embedded devices (routers, security cameras, etc), it's very common to run CGI scripts as root.
So it is not even 30 years ago, it's still today, because of bad practices of the past.
It wasn't exactly for serving the response of the request per se, but a single customer click would launch an AWS ECS container with the whole Ruby and Rails VM just to send a single email message, rather than using a standard job queue.
It is extremely slow and super expensive. Amusingly, the UI had to be hardened so that double clicks don't cause two VMs to launch.
The rationale was that they already had batch jobs running in ECS, so "why not use it for all async operations".
And in a similar vein, Postgres (which is generally well liked!) uses a new backend process per connection. (Of course this has limitations, and sometimes necessitates pgbouncer, but not always.)
However, through the years I learned:
- yes, forks and in general processes are fast - yes, it saves memory and CPU on low load sites - yes, it’s simple protocol and can be used even in shell
However,
- splitting functions (mimic serverless) as different binaries/scripts creates mess of cross scripts communication - deployment is not that simple - security wise, you need to run manager as root and use unique users for each script or use cgroups (or at least chroot). At that moment the main question is why not use containers asis
Also, compute wise, even huge Go app with hundreds endpoints can fit just few megabytes of RAM - there is no much sense to save so few memory.
At worst - just create single binary and run on demand for different endpoints
Uber famously switched from pg to mysql because their SWEs couldn't properly manage connections
Having said, for inhouse use, embedded and/or trusted environments they allow super quick hacks, e.g. complete CGI 'program' to print the date (put it in cgi-bin and make it executable):-
#!/bin/sh
echo "Context-Type: text/plain\r\n"
echo "Your IP is $REMOTE_ADDR"
date
The process overhead is likely to hurt CGI though, which is why FastCGI was developed. As covered elsewhere in the thread, having a fast front end with effective comms to a well-written backend seems a reasonably sweet spot.
There is work-arounds but usually it a better idea to ditch PHP for a better technology more suited for modern web.
And for your information, you can have stateful whatnots in PHP. Hell, you can have it in CSS as I have demonstrated in my earlier comments.