Take the best case scenario, copying a string where the precise length is unknown but we know it will always fit in, say, 64 bytes.
In earlier days, I would always have used strcpy() for this task, avoiding the "wasteful" extra copies memcpy() would make. It felt efficient, after all you only replace a i < len check with buf[i] != null inside your loop right?
But of course it doesn't actually work that way, copying one byte at a time is inefficient so instead we copy as many as possible at once, which is easy to do with just a length check but not so easy if you need to find the null byte. And on top of that you're asking the CPU to predict a branch that depends completely on input data.
Which of course causes issues when languages with more proper strings interact with C but there you go.
Ideally, the standard would include a type that packages a string with its length, and had functions that used that type and/or took the length as an argument. But even without that it is possible avoid using null terminated strings in a lot of places.
And maybe even have a (arch dependent) string buffer zone where the actual memory length is a multiple of 4 or even 8
After years I now think it's essential to have a library which records at least how much memory is allocated to a string along with the pointer.
Something like this: https://github.com/msteinert/bstring
This idiom:
char hostname[20];
...
strncpy( hostname, input, 20 );
hostname[19]=0;
exists because strncpy was invented for copying file names that got stored in 14-byte arrays, zero terminated only if space permitted (https://stackoverflow.com/a/1454071)I have long wondered how terrible it would have been to have some sort of "varint" at the beginning instead of a hard-coded number of bytes, but I don't have enough experience with that generation to have a good feel for it.
A library that records how much memory is allocated to a string along with the pointer isn't a necessity.
Most people who write in C professionally are completely used to it although the footgun is (and all of the others are) always there lurking.
You'd generally just see code like this:-
char hostname[20];
...
strncpy( hostname, input, 20 );
hostname[19]=0;
The problem obviously comes if you forget the line to NUL that last byte AND you have a input that is greater than 19 characters long.(It's also very easy to get this wrong, I almost wrote `hostname[20]=0;` first time round.)
I remember debugging a problem 20+ years ago on a customer site with some software that used Sybase Open/Server that was crashing on startup. The underlying TDS communications protocol (https://www.freetds.org/tds.html) had a fixed 30 byte field for the hostname and the customer had a particularly long FQDN that was being copied in without any checks on its length. An easy fix once identified.
Back then though the consequences of a buffer overrun were usually just a mild annoyance like a random crash or something like the Morris worm. Nowadays such a buffer overrun is deadly serious as it can easily lead to data exfiltration, an RCE and/or a complete compromise.
Heartbleed and Mongobleed had nothing to do with C string functions. They were both caused by trusting user supplied payload lengths. (C string functions are still a huge source of problems though.)
This doesn't seem very relevant. The same can be said of countless other bad APIs: see years of bad PHP, tons of memory safety bugs in C, and things that have surely led to significant sums of money lost.
> It's also very easy to get this wrong, I almost wrote `hostname[20]=0;` first time round.
Why would you do this separately every single time, then?
The problem with bad APIs is that even the best programmers will occasionally make a mistake, and you should use interfaces (or...languages!) that prevent it from happening in the first place.
The fact we've gotten as far as we have with C does not mean this is a defensible API.
Not many people starting a new project (commercial or otherwise) are likely to start with C, for very good reason. I'd have to have a very compelling reason to do so, as you say there are plenty of more suitable alternatives. Years ago many of the third party libraries available only had C style ABIs and calling these from other languages was clumsy and convoluted (and would often require implementing cstring style strings in another language).
> Why would you do this separately every single time, then?
It was just an illustration or what people used to do. The "set the trailing NUL byte after a strncpy() call" just became a thing lots of people did and lots of people looked for in code reviews - I've even seen automated checks. It was in a similar bucket to "stuff is allocated, let me make sure it is freed in every code path so there aren't any memory leaks", etc.
Many others would have written their own function like `curlx_strcopy()` in the original article, it's not a novel concept to write your own function to implement a better version of an API.
The effort was usually out of proportion with the achievement.
I crashed my own computer a lot before I got Linux. Do you remember far pointers? :-( In those days millions of dollars were made by operating systems without memory protection that couldn't address more than 640k of memory. One accepted that programs sometimes crashed the whole computer - about once a week on average.
Despite considering myself an acceptable programmer I still make mistakes in C quite easily and I use valgrind or the sanitizers quite heavily to save myself from them. I think the proliferation of other languages is the result of all this.
In spite of this I find C elegant and I think 90% of my errors are in string handling so therefore if it had a decent string handling library it would be enormously better. I don't really think pure ASCIIZ strings are so marvelous or so fast that we have to accept their bullshit.
Impossible to get wrong with a modern compiler that will warn you on that or LSP that will scream the moment you type ; and hit enter/esc.
But also all of this book-keeping takes up extra time and space which is a trade-off easily made nowadays.
Viruses did exist, and these were considered users' fault too.
A couple years ago we got a new manual page courtesy of Alejandro Colomar just about this: https://man.archlinux.org/man/string_copying.7.en
As an aside, this is part of the reason why there are so many C successor languages: you can end up with undefined behavior if you don’t always carefully read the docs.
Also I'm not sure what you mean with C successor languages not having undefined behaviour, as both Rust and Zig inherit it wholesale from LLVM. At least last I checked that was the case, correct me if I am wrong. Go, Java and C# all have sane behaviour, but those are much higher level.
In general, I see two issues at play here:
1. C relies heavily on unsized pointers (vs. fat pointers), which is why strncpy_s had to "break" strncpy in order to improve bounds checks.
2. strncpy memory aliasing restrictions are not encoded in the API and can only be conveyed through docs. This is a footgun.
For (1), Rust APIs of this type operate on sized slices, or in the case of strings, string slices. Zig defines strings as sized byte slices.
For (2), Rust enforces this invariant via the borrow checker by disallowing (at compile-time) a shared slice reference that points to an overlapping mutable slice reference. In other words, an API like this is simply not possible to define in (safe) Rust, which means you (as the user) do not need to pore over the docs for each stdlib function you use looking for memory-related footguns.
It would make little sense for strncpy to handle this case, since, as I pointed out above, it converts between different kinds of strings.
You would think char symbol[20] would be inefficient for such performance sensitive software, but for the vast majority of exchanges, their technical competencies were not there to properly replace these readable symbol/IDs with a compact/opaque integer ID like a u32. Several exchanges tried and they had numerous issues with IDs not being "properly" unique across symbol types, or time (restarts intra-day or shortly before the open were a common nightmare), etc. A char symbol[20] and strncpy was a dream by comparison.
There’s languages where you can be quite confident your string will never need null termination… but C is not one of them.
char buf[12];
sprintf(buf, "%s%s", this, that); // or
strcat(buf, ...) // or
strncpy(buf, ...) // and so on..There are certainly ways in which the c library could've been better (eg making strncpy handle the case where the source string is longer than n) but ultimately it will always need to operate under the assumption that the people using it are both competent and acting in good faith.
Strlcpy tries to improve the situation but still has problems. As the article points out it is almost never desirable to truncate a string passed into strXcpy, yet that is what all of those functions do. Even worse, they attempt to run to the end of the string regardless of the size parameter so they don't even necessarily save you from the unterminated string case. They also do loads of unnecessary work, especially if your source string is very long (like a mmaped text file).
Strncpy got this behavior because it was trying to implement the dubious truncation feature and needed to tell the programmer where their data was truncated. Strlcpy adopted the same behavior because it was trying to be a drop in replacement. But it was a dumb idea from the start and it causes a lot of pain unnecessarily.
The crazy thing is that strcpy has the best interface, but of course it's only useful in cases where you have externally verified that the copy is safe before you call it, and as the article points out if you know this then you can just use memcpy instead.
As you ponder the situation you inevitably come to the conclusion that it would have been better if strings brought along their own length parameter instead of relying on a terminator, but then you realize that in order to support editing of the string as well as passing substrings you'll need to have some struct that has the base pointer, length, and possibly a substring offset and length and you've just re-invented slices. It's also clear why a system like this was not invented for the original C that was developed on PDP machines with just a few hundred KB of RAM.
Is it really too late for the C committee to not develop a modern string library that ships with base C26 or C27? I get that they really hate adding features, but C strings have been a problem for over 50 years now, and I'm not advocating for the old strings to be removed or even deprecated at this time. Just that a modern replacement be available and to encourage people to use them for new code.
Well if you bother looking up that it's originally created for non null-terminated strings, then it kinda makes sense.
The real problem begun when static analyzers started to recommend using it instead of strcpy (the real alternative used to be snprintf, now strlcpy).
https://pubs.opengroup.org/onlinepubs/9799919799/functions/s...
(I agree with the author of the piece that strlcpy doesn't actually solve the real problem.)
I looked up the actual reason for its inception:
---
Rationale for the ANSI C Programming Language", Silicon Press 1990.
4.11.2.4 The strncpy function
strncpy was initially introduced into the C library to deal with fixed-length name fields in structures such as directory entries. Such fields are not used in the same way as strings: the trailing null is unnecessary for a maximum-length field, and setting trailing bytes for shorter names to null assures efficient field-wise comparisons. strncpy is not by origin a "bounded strcpy," and the Committee has preferred to recognize existing practice rather than alter the function to better suit it to such use.Padded and terminated strings are completely different beasts. And the text you quote tells you black on white that strncpy deals in padded strings.
“the trailing null is unnecessary for a maximum-length field”
That is a non–null terminated string.
So now it silently fails and sets dest to an empty string without even partially copying anything!?
DEBUGASSERT(slen < dsize);
it means it succeeded. Although some compilers will remove the assertions in release builds.I would have preferred an explicit error code though.
But regardless of whether the assert is compiled or not, its presence strongly signals that "in a C program strcpy should only be used when we have full control of both" is true for this new function as well.
This makes a lot of sense but one time I find this gets messy is when there’s times I need to do checks earlier in a dataset’s lifetime. I don’t want to pay to check multiple times, but I don’t want to push the check up and it gets lost in a future refactor.
I’m imagining a metadata for compile time that basically says, “to act on this data it must have been first checked. I don’t care when, so long as it has been by now.” Which I’m imagining is what Rust is doing with a Result type? At that point it stops mattering how close to code a check is, as long as you type distinguish between checked and unchecked?
> It has been proven numerous times already that strcpy in source code is like a honey pot for generating hallucinated vulnerability claims
This closing thought in the article really stood out to me. Why even bother to run AI checking on C code if the AI flags strcpy() as a problem without caveat?
people overestimate AI
So like why doesn't the person iterate with the AI until they understand the bug (and then ultimately discover it doesn't exist)? Like have any of this bug reports actually paid out? It seems like quickly people should just give up from a lack of rewards.
This sounds a bit like expecting the people who followed a "make your own drop-shipping company" tutorial to try using the products they're shipping to understand that they suck.
Not helped, I imagine, that once you realise it doesn't work, an easy pivot is to start convincing new people that it'll work if they pay you money for a course on it.
I don't really think this adds anything over forcing callers to use memcpy directly, instead of strcpy.
IMHO the timeline figure could benefit in mobile from using larger fonts. Most plotting libraries have horrible font size defaults. I wonder why no library picked the other extreme end: I have never seen too large an axis label yet.
(Depends on what you replace it with obviously...)
... And if the copy can't be made, apparently the destination is truncated as long as there's space (i.e., a null terminator is written at element 0). And it returns void.
I'm really not sold on that being the best way to handle the case where copying is impossible. I'd think that's an error case that should be signaled with a non-zero return, leaving the destination buffer alone. Sure, that's not supposed to happen (hence the DEBUGASSERT macro), but still. It might even be easier to design around that possibility rather than making it the caller's responsibility to check first.
Why is this even a thing and isn't opt-in?
I dread the idea of starting to get notifications from them in my own projects.
void nobody_calls_me(const char *stuff) {
char *a, *b;
const size_t c = 1024;
a = calloc(c);
if (!a) return;
b = malloc(c);
if (!b) {
free(a);
return;
}
strncpy(a, stuff, c - 1);
strcpy(b, a);
strcpy(a, b);
free(a);
free(b);
}
Some clever obfuscation would make this even more effective.Flashback of writing exploits for these back in high school.
> A new breed of AI-powered high quality code analyzers, primarily ZeroPath and Aisle Research, started pouring in bug reports to us with potential defects. We have fixed several hundred bugs as a direct result of those reports – so far.
[1] https://daniel.haxx.se/blog/2025/12/23/a-curl-2025-review/
https://daniel.haxx.se/blog/2025/10/10/a-new-breed-of-analyz...
and its HN discussion:
https://gist.github.com/bagder/07f7581f6e3d78ef37dfbfc81fd1d...