The problem is that it's currently legal to pass a string literal to a function expecting a (non-const) pointer-to-char argument. As long as the function doesn't try to write through the pointer, there's no undefined behavior. (If the function does try to write through the pointer, the behavior is undefined, but no compile-time diagnostic is required.) If a future version of C made string literals const, such a program would become invalid (a constraint violation requiring a diagnostic). Such code was common in pre-ANSI C, before const was introduced to the language.
The following is currently valid C. The corresponding C++ code would be invalid. The proposal would make it invalid in C, with the cost of breaking some existing code, and the advantage of catching certain errors at compile time.
#include <stdio.h>
void print_message(char *message) {
puts(message);
// *message = '\0'; // would have undefined behavior
}
int main(void) {
print_message("hello");
}
Of course it is. It doesn't work on anything modern, and thus it is impossible for portable code which actually runs in the real world and has to work to have relied on it for a long time.
Your example is not code any competent C programmer would ever write, IMHO. Every proficient C programmer I've ever worked with used "const char *" for string literals, and called out anybody who didn't in review.
Old code already needs special flags to build with modern compilers: I think the benefit of doing this outweighs the cost of editing some makefiles.
Apart from that, it's not about actually modifying string literals. It's about currently valid (but admittedly sloppy) code that uses a non-const pointer to point to a string literal. It's easy to write such code in a way that a modern conforming C compiler will not warn about.
That kind of code is the reason that this proposed change is not just an obvious no-brainer, and the author is doing research to find out how much of an issue it really is.
As it happens, I think that the next C standard should make string literals const. Any code that depends on the current behavior can still be compiled with C23 or earlier compilers, or with a non-conforming option, or by ignoring non-fatal warnings. And of course any such code can be fixed, but that's not necessarily trivial; making the source code changes can be a very small part of the process.
Any change that can break existing valid code should be approached with caution to determine whether it's worth the cost. And if the answer is yes, that's great.
I don't understand your point here: I disagree this is "obvious", and I don't think I've said anything to imply that?
> And of course any such code can be fixed, but that's not necessarily trivial; making the source code changes can be a very small part of the process
In many cases, it's so trivial you can write code to patch the code. Often, the resulting stripped binary will be identical, so you can prove it's not necessary to even test the result! If decision makers can be made to understand that, you can run around most corporate process that makes this sort of thing hard.
I've spent a lot of time fixing horrible old proprietary code to use const because I think it's important: most of the time, it's very easy. I don't deny there are rats nests that require a lot of refactoring to unwind, but that is the exception rather than the rule, in my personal experience.
It will be vanishingly rare that code will need to be modified in a way that actually changes its runtime behavior to tolerate the proposed change.
My point is also that that's a valid reason to proceed carefully before making the change.
Even if the required source code changes are trivial or automatable, there will still be some variable amount of work required to deploy the changes. For a small program or library, maybe you can just rebuild and deploy. But for some projects, any change requires going through a full round of review, testing, recertification, and so on. For an update to code that controls a medical device or a nuclear reactor, for example, changing the code is the easy part.
I support the proposed change. I also support performing all due diligence before imposing it on all future implementations and C software.
If the new binary is literally identical to the last one which was passed validation, absolutely zero additional testing is required. It is a waste of resources to retest an identical binary (assuming everything else can be held constant of course, which obviously can't always be the case).
Actually sending our hypothetical refactoring to production would itself be a waste of resources anyway, since the binary is identical... you just skip it, wait for the next real change, and then proceed as usual.
All processes have exceptions, the "binary identical output" is an easy one if your leadership chain is capable of understanding it.
And to be clear, "binary" here could absolutely mean "entire firmware image". The era of reproducible builds is upon us, and it is glorious.
But ...
"The era of reproducible builds is upon us"
What about old code built with old toolchains? And what about organizational policies that require a full round of testing for any update? How hard do you think it would be to change such policies?
No doubt there's some software that could easily be modified, recompiled, and released. My point is that there is also some software that can't.
And yes, in those cases the likely solution is to leave the code alone and continue to build it with the old toolchain.
The point is that the proposed change will break existing valid code, and that has a non-zero cost. I support Jens Gustedt's effort to find out just what that cost is before imposing the change. (And again, I hope the change does go into the next edition of the standard.)
But maybe 70 warnings in 250k LoC is OK for your standards of proficiency.
70 warnings really doesn't sound that bad to fix. Most are probably trivial. I'm sure a few aren't.
If nobody is around to fix it, that's what legacy flags are for.
Moreover, an old style cast. GNU c++ has an opt-in warning for those, -Wold-style-cast. You then need const_cast to get around that.
Then we can grep the program for that new style cast (unless it token-pasted the const_cast together in the preprocessor haha).
In C we can make such a program which contains no devices that defeat the type system, and which otherwise requires no diagnostic.
For example, clang started simply omitting writes to data it knows to be read-only (which is allowed because these writes are undefined behavior, so anything goes). See this example[1]: `writable()` will return "*ello", but `readonly()` will just return "hello" and not crash (note its assembly doesn't include a write).
Although, I am curious if that optimization could happen across compilation units via LTO...
If you change `writable()` to receive a `const char *` (and then cast it to `char *` to write), then clang will be forced to compile it with a store (even though it sees you storing to a `const char *`) because it doesn't know if the function will be called with a pointer to actual read-only data or just a pointer to writable data that was gratuitously converted to `const`.
That's exactly my point yeah, the optimization you described is only possible because you gave the compiler extra knowledge about the argument to that function (because it was static in the same compilation unit). It's artificial, typically that won't be the case.
I remember there was a lot of confusion when llvm started removing stores to read-only memory[1], some people got angry because it broke some kernel code (that only worked because being in a kernel the memory page wasn't actually marked as read-only) and thought it would break any code that cast away a `const`, which is very common and valid as long as it was gratuitously `const`, as you say.
[1] https://releases.llvm.org/9.0.0/docs/ReleaseNotes.html#notew...
I'm not denying that there are codebases where trying this would result in an Armageddon of refactoring, but I would venture that's the exception rather than the rule.
Most C programmers use "const char*" for string literals, and have for a long time.
C++ went through this over 20 years ago. I can't remember if it was already in c++03 or whether it was a post '03 draft feature.
BLAS, gemv, GEMM, SGEMM libraries are from 1979, 1984, 1989. You may have seen these words scroll by when compiling modern 2025 CUDA :)
C has no backwards compatibility guarantee, and it never has. Try compiling K&R C with gcc's defaults, and see what happens.
You can build your legacy code with legacy compiler flags. Why do you care about the ability to build under the modern standards?
In AVR or other MPU-less architecture you can literally modify the string literal memory without triggering a crash.
Why? Because there is no memory protection ("rodata") at all.
And such microprocessors are still in use today, so it's a bit too far fetched to say "really old code."
It's UB, sure, but how many embedded programmers actually care? The OP's proposal is trying to change the type system so that this UB becomes much less likely to trigger in practice.
Quote from gcc manual, explaining why you need to compile old code with -writable-strings option: "you cannot call mktemp with a string constant argument. The function mktemp always alters the string its argument points to.
Another consequence is that sscanf does not work on some systems when passed a string constant as its format control string or input. This is because sscanf incorrectly tries to write into the string constant. Likewise fscanf and scanf."
The standard C library uses const char * almost everywhere where a string is accepted that will not be modified.
So you might have a function that doesn't have proper "const" qualifications in its prototype like:
void my_log(char *message);
and then call-sites like: my_log("Hello, World!");
...and that needed to stay compiling.