Directly calling into system calls (“write”) is interesting.
See, I thought it was a nice separation of concerns and wondered why we lost such a nice approach, until I read:
> How escaping wildcards works in the V5 and V6 shell is that all characters in commands and arguments are restricted to being seven-bit ASCII. The shell and /etc/glob both use the 8th bit to mark quoted characters, which means that such quoted characters don't match their unquoted versions and won't be seen as wildcards by either the shell
at which point I suddenly became a fan of ditching it. I do wonder if there's not some better way to factor that functionality out...
[1] Even worse, the top 4kW/8kB was reserved for I/O.
And for I/D split you needed appropriate CPU model.
the top 8kB "I/O page" is reserved as part of the kernel space, not userspace, so it does not impact as much the userspace part.
But if you really insist, you can write your own glob(1) that would invoke glob(3) for you, sure. There is also wordexp(3) although I believe its implementations had security problems for quite some time?
Globbing is also a separate built in, which allows for other types of wildcard matches like regex too. Eg https://murex.rocks/tour.html#filesystem-wildcards-globbing
So you have have the best of both worlds: inline globbing for convenience and also wildcard matching as a function too.
It also ditched another special case recently: the leading ~.
Just use backslash escaping like we do practically everywhere else in the Unix world?
What do you not like about escaping?
Of course, for program-to-program communication you can also use different techniques, instead of escaping. Escaping is just the most human-readable and human-producible.
(As a simple example, to be able to represent all characters in a string, you can either escape quotes like \" or you can prefix the string with its length.
Computers can work with either convention, but humans will hate you if they have to prefix every string literal with its length and keep that length in sync with the string.)
But expansions and substitutions with escaping are the can of worms.
Find and xargs can delimite filenames by NUL, which is not allowed in filenames. Best practice in SQL was to abandon parameters escaping completely and pass them out of band. For internal representation, use array datastructures with length information.
Actually, would it be that bad, to ban * and ? in filenames? If you accept them in the name of interop, something inevitably breaks later. Better to fail upfront. Many applications do sanitize filenames already and when they need to use binary data as file name, convert it to hex instead. It's a hassle otherwise.
That's possible, if you design your filesystem from scratch.
But if you take your filesystem as given for now (with its ability to represent all kinds of interesting characters), and just want to design globbing you have to solve this problem. Otherwise you have a tool that can only handle some files. That's what Gnu Make does, btw. Try handling any file or output with whitespace in the name in Make, if you want some frustration.
Yes, null-termination works for the specific problem of termination. Though if you just use program-to-program communication, you can also prefix your strings with their length.
> If you accept them in the name of interop, something inevitably breaks later.
Why? That's only the case when you have legacy software written by less than careful people. There's no reason to expect breakage when you are designing new software, just like the people in the article where doing. (Of course, back then they didn't know what they were doing, so we have a lot of breakage historically.)
But for the very specific purpose of the shell talking to a helper program for globbing, you can control exactly what's happening, including all the encoding and decoding (or escaping and unescaping). So there's no unexpected breakage.
And btw, you also need to give the human a way to specify a literal * in a filename, too. Not just for communication between programs.
> Best practice in SQL was to abandon parameters escaping completely and pass them out of band.
Yes, that's partially because SQL is such a complicated language, and because you are talking about program-to-program communication anyway, so you don't need to be human-friendly there. So communicating them on a separate is the simplest thing that covers all cases.
I use xterm.js a lot and have a "shell backbone" that I use to make shell based access to APIs, S3 and other things "cloud." This is essentially how I implement globbing as well. The convenience is that you can run glob by itself to get an idea of exactly what kind of automated nightmare you are about to kick off.
Anyways.. mine currently has V3 behavior. My shell command exec routine could actually benefit from that hack. What's old is new again?
The 'failglob' shopt option will cause an error to be generated if a glob matches nothing.
The 'nullglob' shopt option toggles between no match expanding to an empty string and the traditional default of no match leaving the glob characters untouched.
# echo foo*bar
foo*bar
https://github.com/torvalds/linux/blob/4a5df37964673effcd9f8...
Today there's much more software, so some things got moved into finer-grained locations like /libexec and /sbin. That wasn't the case in the /etc/glob era when the entire UNIX system was smaller than today's average web page.
You could argue that Lisp reader macros also somewhat violate this rule. As a longtime Lisp fan, I dislike reader macros, but I'm more conflicted about macros in general. A good macro system should aim to provide enough context for IDEs and LSPs to aid the developer, but Lisp macros are entirely about just transforming the AST. It's usually just better to evolve the language.
Thank you for your perspective, work, and contributions.
But the PDP-11 system that many of these designs were made upon had a minimum memory size of 4K bytes and with varying models that had different maximum memory sizes that are smaller than a single JPEG photo in today's world: PDP 11/45 max memory 256kbyte - PDP 11/70 max memory 4Mbyte.
And this was the total memory for everything, the OS, and the users, and the system supported multiple users sharing the same machine at the same time.
With those resource constraints, the design rules that determine good from poor are radically different than with one of today's systems with multiple Gb of RAM.
The shells weren’t originally intended to be Turing complete. They were just a job launcher. What you use today would have been unimaginable when these shells were first designed.
Whereas all other programming languages have had a drastically smaller evolution in comparison and yet still had a worse compatibility story.
It’s very easy to be critical of the Bourne shell (and compatible shells too) because they are archaic by modern standards. But they weren’t written to solve modern problems. So it’s like looking at a bicycle and complaining how the designers didn’t design a sports car while ignoring the fact that technology didn’t exist and still push bikes are good enough for millions to use daily.
If you are trying to attack php you are not doing a good job of it, especially because there were good reason for using a separate program for glob.
I won't argue about PHP. I've dealt with it while there was money to be made from that, and moved on as soon as I had the chance. ¯\_(ツ)_/¯