On the foolishness of "natural language programming". https://www.cs.utexas.edu/~EWD/transcriptions/EWD06xx/EWD667...
And its just so obviously correct.
(And also, like DW points, that software is way more complex. But on this case, it's the requirements discovery.)
This hits so hard. cough dynamic typing enthusiasts and vibe coders cough
https://www.cs.utexas.edu/~EWD/transcriptions/EWD12xx/EWD124...
Most examples I can think of would be things like “this method M expects type X” but I can throw in type Y that happens to implement the same properties/fields/methods/whatever that M will use. And this is a really convenient thing for dynamic languages. A static language proponent would call this an obvious bug waiting to happen in production when the version of M gets updated and breaks the unspecified contract Y was trying to implement, though.
Dijkstra already wrote this in the 80s and today many teachers still complain about this fact. I also know that, at least in the Netherlands, the curriculum is judged based on the percentage of students that pass. If too few students pass, then the material is made easier (never harder!), so you can imagine what happens if this process continued for half a century by now.
I think a compelling argument can be made that 0-based is better for offsets and 1-based is better for indexes, and that we should not think of both as the same thing. https://hisham.hm/2021/01/18/again-on-0-based-vs-1-based-ind...
Zero-based indexing is naturally coupled with using only half-open ranges.
When using zero-based indexing and half-open ranges, accessing an array forwards, backwards or circularly is equally easy.
In this case you can also do like in the language Icon, where non-negative indices access the array forwards, while negative indices access the array backwards (i.e. -1 is the index of the last element of the array, while 0 is the index of the first element).
In languages lacking the Icon feature, you just have to explicitly add the length of the array to the negative index.
There is absolutely no reason to distinguish offsets and indices. The less distinct kinds of things you need to keep in mind, the less chances for errors from using the wrong thing. Therefore one should not add extra kinds of things in a programming language, unless there is a real need for them.
There are things that are missing from most languages and which are needed, e.g. separate types for signed integers, unsigned integers, modular numbers, bit strings and binary polynomials (instead of using ambiguous unsigned integers for all the 4 latter types, which prevents the detection of dangerous errors, e.g. unsigned overflow), but distinguishing offsets from indices is not a useful thing.
Distinguishing offsets and indices would be useful only if the set of operations applicable to them would be different. However this is not true, because the main reason for using indices is to be able to apply arithmetic operations to them. Otherwise, you would not use numbers for indexing, but names, i.e. you would not use arrays, but structures (or hash tables, when the names used for access are not known at compile time).
The whole point here is to use a single kind of range, without exceptions, in order to avoid the errors caused by using the wrong type of range for the context.
For backwards iteration the right range is [-1,-1-n), i.e. none of those listed by you.
Like for any such range, the number of accessed elements is the difference of the range limits, i.e. n, which is how you check that you have written the correct limits. When the end of the range is less than the start, that means that the index must be decremented. In some programming languages specifying a range selects automatically incrementing or decrementing based on the relationship between the limits. In less clever languages, like C/C++, you have to select yourself between incrementing and decrementing (i.e. between "i=0;i<n;i++" and "i=-1;i>-1-n;i--").
It is easy to remember the backwards range, as it is obtained by the conversion rules: 0 => -1 (i.e. first element => last element) and n => -n (i.e. forwards => backwards).
To a negative index, the length of the array must be added, unless you use a programming language where that is done implicitly.
In the C language, instead of adding the length of the array, one can use a negative index into an array together with a pointer pointing to one element past the array, e.g. obtained as the address of the element indexed by the length of the array. Such a pointer is valid in C, even if accessing memory directly through it would generate an out-of-range error, like also taking the address of any element having an index greater than the length of the array. The validity of such a pointer is specified in the standard exactly for allowing the access of an array backwards, using negative indices.
The so-called unsigned integers of C/C++ are in fact modular integers, i.e. where the arithmetic operations wrap around and where you can interpret any 64-bit number greater than 2^63-1 as you please, as either a positive integer or as a negative integer. For instance you can interpret 2^63 as either -2^63 or as +2^63.
So using 64-bit "unsigned" integers for indices does not create any problems if you are careful how you interpret them.
However, as another poster has already said, in all popular ISAs the addressable space is actually smaller than 2^64 and in x86-64 the addresses are interpreted as signed integers, not as unsigned integers, so your problem can never appear.
Some operating systems use this interpretation of the addresses as signed integers in order to reserve the negative addresses for themselves and use only positive addresses for the non-privileged programs.
The reason why the addressable space is smaller than afforded by 64 bits is that covering the complete space with page translation tables would require too many table levels. x86-64 has increased the number of page table levels to 4, in order to enable a 48-bit address space, while some recent Intel/AMD CPUs have increased the number of page table levels to 5, in order to increase the virtual address size to 57 bits.
Some decades ago, I have disassembled and studied Microsoft's Fortran compiler.
The fact that Fortran uses 1-based indexing caused a lot of unnecessary complications in that compiler. After seeing firsthand the problems caused by 1-based indexing I have no doubt that Dijkstra was right. Yes, the compiler could handle perfectly fine 1-based indexing, but there really was no reason for all that effort, which should have been better spent on features providing a serious advantage for the programmer.
The use of 1-based indexing and/or closed intervals, instead of consistently using only 0-based indexing and half-open intervals, are likely to be the cause of most off-by-one errors.
Granted, that's an argument for hardware, not for languages, and even the hardware angle is probably long obsolete.
The IBM 704's index registers were decrementing, subtracting their contents from an instruction's address field to form an effective address.
Type A instructions had a three bit tag, indicating which index registers to use.
With those three indexes you get Fortran’s efficient 7D column major arrays. This made data aggregation much more efficient and Cray/Cuda/modern column oriented DBs do similar.
But for Algol-like languages on PDP inspired hardware I do like his convention, if like the example you have a total ordering on the indexes.
But Fortran also changed the way it was taught, shifting from an offset from memory that was indexed down from an address, which makes sense when the number of bits in an address space changes based on machine configuration, to index values.
Technically Fortran, at least in word machines meets his convention.
pointer + offset <= word < pointer + offset +1
It is just the offset is always negative.The use of 1-based indexing was just inherited from the index notation used for vectors and matrices in most mathematical journals at that time.
When IBM 704 was designed (as the successor of IBM 701 and taking into account the experience from IBM NORC), it was designed to allow an efficient implementation of FORTRAN, not vice-versa.
The column-major ordering of FORTRAN allows more efficient implementations of the matrix-vector and matrix-matrix products (by reading the elements sequentially), when these operations are done correctly, i.e. not using the naive dot product definitions that are presented in most linear algebra manuals instead of the useful definitions of these operations (i.e. based on AXPY and on the tensor product of vectors).
Then what do you call “here”?
The name for where you start from in this scenario is usually not required because it’s obvious what you mean and everyone understands the first block means you have to first walk a block, not that where you start is the first block.
So in that sense yes we have a zeroth chapter. That’s when you’re at the beginning of the first one but haven’t read all the way.
What is the name of the block from which you left to enter the first block? Before you started walking I mean.
And mustn’t that block be before that other first? When we move from where we start we count up, so then mustn’t an earlier block be counting down? Counting down would mean a number smaller than one.
And are blocks not counted in units, as whole numbers?
So would it not be the case that one block less than 1 must be by necessity the zeroth block?
In other words if you agree that “as soon as you start walking, you are in the first block”, then you must also agree that before you left you began in the zeroth block.
How else could it be interpreted?
Think of jogging on a road. When you are at the beginning of the road, you are at the start of the first mile, not in the zeroth mile. It doesn't have one more mile prior to first mile.
And everybody knows a pandemic starts with patient 1!
When numbering discrete elements you usually start with 1, so first is 1, second is 2 etc.
Indexes in C are not ordinal numbers though, they should be thought of as offsets or distances from the first element. So [0] is 0 steps away from the first element, hence the first element. The confusion arise when you think these indices are actually ordinal numbers.
that's why we start from 0, not because of voltages, at least in compsci.
I guess the practice was influenced by computer science - I don't know of an example that precedes it, but one fairly early one I've found is Bishop and Goldberg's Tensor Analysis on Manifolds from 1968, with a chapter 0 on set theory and topology. Back then the authors felt the need to justify their numbering in the preface:
"The initial chapter has been numbered 0 because it logically precedes the main topics"
Quite straightforward.
There's also the "zeroth law of thermodynamics", which was explicitly identified long after the first, second, and third laws, but was felt more primary or basic, hence the need for an "ordinal before the first"
I think you’re just biased to think that starting must “naturally” begin with 1.
Zero is just a good a place to start and some people do start counting from zero.
Just the convenience of having an ordinal number to say? Rather than saying "chapter 0, chapter 1, chapter 2" one can say "the fourth chapter"? Or is it the fact that the chapter with number 4 has 3 chapters preceding it?
On first glance I find this all rather meaningless pedantry.
EDIT: Yeah, I don't know why book chapter labels shouldn't start with "0". It seems fine to me. They could use letters instead of numbers for all I care.
Definitions are fine, and I agree that "A" is the first letter. But that's no use to people who need to think clearly about the offset between "A" and "C" right now. Should I tell them they're wrong, they have to count to three and then subtract one? Because the dictionary says so?
Book chapters and page numbers are not offsets.
You're also wrong about there being no 0th mile. https://www.atlasobscura.com/places/u-s-route-1-mile-0-sign
A while back someone posed EWD765 for an alternate solution, I don't recall if any other solution was found. That was my introduction to these.
Besides a mathematical inclination, an exceptionally good mastery of one's native tongue is the most vital asset of a competent programmer.
...
The use of anthropomorphic terminology when dealing with computing systems is a symptom of professional immaturity.
...
Projects promoting programming in "natural language" are intrinsically doomed to fail.
I'd also recommend EWD1305 https://www.cs.utexas.edu/~EWD/transcriptions/EWD13xx/EWD130... Answers to questions from students of Software Engineering
[The approximate reconstruction of the questions is left as an exercise to the reader.]
...
No, I'm afraid that computer science has suffered from the popularity of the Internet. It has attracted an increasing —not to say: overwhelming!— number of students with very little scientific inclination and in research it has only strengthened the prevailing (and somewhat vulgar) obsession with speed and capacity.
Yes, I share your concern: how to program well —though a teachable topic— is hardly taught. The situation is similar to that in mathematics, where the explicit curriculum is confined to mathematical results; how to do mathematics is something the student must absorb by osmosis, so to speak. One reason for preferring symbol-manipulating, calculating arguments is that their design is much better teachable than the design of verbal/pictorial arguments. Large-scale introduction of courses on such calculational methodology, however, would encounter unsurmountable political problems.Also the burn in the beginning of EWD899 (not transcribed) is noteworthy:
A review of a paper in AI. I read "Default Reasoning as Likelihood Reasoning" by Elaine Rich. (My copy did not reveal where it had been published; the format suggests some conference proceedings. If that impression is correct, I am glad I did not attend the conference in question.
I keep meaning to sit down with this site and make my way through it all. Might make more progress if I grab them into an eReader-friendly format and then peruse them more easily when travelling.
Less likely to make mistakes if you can’t erase
Some famous Dijkstra quotes: “It is practically impossible to teach good programming to students that have had a prior exposure to BASIC: as potential programmers they are mentally mutilated beyond hope of regeneration.”
“Object-oriented programming is an exceptionally bad idea which could only have originated in California.”
As a UC Santa Cruz masters alum, my favorite Dijkstra quotes come from notes from his visit to UCSC in the 1970s (https://www.cs.utexas.edu/~EWD/transcriptions/EWD07xx/EWD714...):
“I found the UCSC campus not an inspiring place, and the longer I stayed there, the more depressing it became. The place seemed most successful in hiding all the usual symptoms of a seat of learning. In the four-person apartment we occupied, only one of the four desks had a reading lamp, and the chairs in front of the desks were so low that writing at the desks was not comfortable. Probably it doesn't matter. Can UCSC students write? Do they need to? The notice boards showed ads from typing services "Grammar and spelling corrected.". (One of these ads itself contained a spelling error!)”
https://news.ycombinator.com/item?id=11796926
alankay on May 30, 2016 | next [–]
This quote keeps on showing up out of context. Edsger and I got along quite well. He loved to be the way he was and pushed it. ...
(and yes, I left that out of context so that people would go read the whole thing)Some meditations on Advanced Programming
https://www.cs.utexas.edu/~EWD/transcriptions/EWD00xx/EWD32....
As I'm currently in a Functional Programming course in Haskell... This resonated.
I know that we'll always need to write programs which directly interface with memory.
However, when we don't need to do that... Maybe we shouldn't write programs in this style (i.e. imperative). Maybe we shouldn't even use an imperative language (I know, that's a stretch, many languages have incorporated functional aspects and we can utilize them instead of trying to avoid the language entirely).
---
Dijkstra ends EWB 32 with:
"Smoothly we have arrived at the third component of our tool, viz. the language: also the language should be a reliable one. In other words it should assist the programmer as much as possible in the most difficult aspect of his task, viz. to convince himself —and those others who are really interested— that the program he has written down defines indeed the process he wanted to define."
"As my very last remark I should like to stress that the tool as a whole should have still another quality. It is a much more subtle one; whether we appreciate it or not depends much more on our personal taste and education and I shall not even try to define it. The tool should be charming, it should be elegant, it should be worthy of our love. This is no joke, I am terribly serious about this. In this respect the programmer does not differ from any other craftsman: unless he loves his tools it is highly improbable that he will ever create something of superior quality."
At the same time these considerations tell us the greatest virtues a program can show: Elegance and Beauty."
---
Functional languages... help us achieve these aims.
https://medium.com/@acidflask/this-guys-arrogance-takes-your...
Generally I think this does answer the question about why functional languages don't dominate more than they do - although you could make an argument that JavaScript is a functional language, and it certainly is enjoying a lot of dominance these days. JS environments aren't known for being particularly efficient, though. To me, efficient use of resources is elegant, and a language needs to be able to do that.
The challenge is reconciling these three views of programming: the holy grail is a programming language that is ergonomic and expressive, yet is also amenable to mathematical reasoning and can be implemented efficiently. I wonder if there is a programming language theory version of the CAP theory in distributed systems, where one compares performance, ease of mathematical reasoning about code, and human factors?
but as far as meaningful technical contributions, i struggle to find anything. his path search algorithm, no offence, is self-evident.
for all the disdain he appears to have had for (what we now call) the 'move fast and break things' style of engineering/science, they were the ones that gave us everything today. you innovate by running experiments, not philosophising and writing proofs.
in retrospect he probably should have stayed on his initial discipline, theoretical physics.
who had a fashion of calling every thing "odd" that was beyond his comprehension, and thus lived amid an absolute legion of "oddities." -- from "The Purloined Letter" by Edgar Allan Poe.
To dismiss you/your comment;
“Mediocrity knows nothing higher than Itself; but Talent instantly recognizes Genius.” -- from "The Valley of Fear" by Arthur Conan Doyle.
To deflate your claim of Dijkstra's Algorithm being "self-evident";
Students Struggle with Concepts in Dijkstra's Algorithm -- https://dl.acm.org/doi/fullHtml/10.1145/3632620.3671096
Edsger Dijkstra contributed to;
1) Algol60 language/compiler
2) THE Operating System
3) Graph Algorithms (shortest-path etc.)
4) Concurrent Algorithms (semaphores, CSP etc.)
5) Distributed Algorithms (dining philosophers etc.)
6) Fault-Tolerant Computing (which he called "Self-Stabilizing Systems")
7) Programming Language Design (GCL etc.)
8) Structured Programming Techniques.
9) Program Correctness Methodologies based on Predicate Calculus to derive Programs (weakest-precondition etc.)
10) Essays giving insights into "How to Think and Reason Systematically" using commonsense and mathematical Tools.No idea how this slipped by for so long.
I don’t think it was ever a conscious decision. It’s similar to how I always pictured Jane Austen as a sarcastic woman in her forties while reading her books, but she wrote her most famous works being borderline a teenager. Your mind just fills things up I guess.