"Syntax" just means what strings are valid programs. For example, x=3 should be valid in C, while =+x shouldn't. Note that this doesn't say anything about what x=3 actually means in practice. The fact that there is a variable called x, and that it has a value at some point in time, and that after the execution of x=3 that value becomes 3 are all semantics.
"whether the construct is allowed/part of the language (syntax)"
is the same as yours
""Syntax" just means what strings are valid programs"
Strings are not objects in the syntactic domain, only terms are. And parsers operate on tokens, recognizing valid compositions of tokens (at least syntactically). Syntax concerns relations between tokens and thus their form and composition.
Semantics concerns the meanings of terms, where “meaning” is considered from various perspectives, like the denotational or operational (what the OP has in mind w.r.t. evaluation effectively concerns the denotational semantics). While computation is purely syntactic, denotational semantics is not circular, because the correspondence we assign (in our minds) to terms is with models that already possess a semantic content of their own.
Consider a simple calculator language that lets you add positive integers.
The syntax might give grammar rules like:
expr ::= digits (EOF | ' + ' expr)
digits ::= ('0' | '1' | ... | '9')+
This grammar admits expressions like 33 and 2 + 88 + 344. That's syntax.But how the language interprets those expressions is semantics. Continuing our calculator example:
1. Every expr evaluates to an integer value.
2. An expr of the form `<digits> EOF` evaluates to the integer given by interpreting the digits in <digits> as an integer in base 10.
3. If <expr> evaluates to the value x, then an expr of the form `<digits> + <expr>` evaluates to the value y + x, where y is the integer given by interpreting the digits in <digits> as an integer in base 10.
Of course, specifying semantics in human language is tedious and hard to make precise. For this reason, most programming languages give their semantics in a more formalized language. See, for a classic example, the R5RS specs for Scheme:
https://conservatory.scheme.org/schemers/Documents/Standards...
Arguably this is also semantics. Type checking and reporting type errors decides whether a construct is allowed or not, yet belongs squarely in the semantic analysis phase of a language (as opposed to the syntactic analysis phase).
> how it differs from syntax
Consider a language like C, which allows code like this:
if (condition) {
doWhenTrue();
}
And consider a language like Python, which allows code like this: if condition:
doWhenTrue()
The syntax and the semantics are both different here.The syntax is different: C requires parens around the condition, allows curly braces around the body, and requires `;` at the end of statements. Python allows but does not require parens around the condition, requires a `:`, requires indenting the body, and does not allow a `;` at the end of statements.
Also, the semantics are different: in C, `doWhenTrue()` only executes if `condition` either is a non-zero integer, or can be implicitly coerced to a non-zero integer.
In Python, `doWhenTrue` executes if `condition` is "truthy," which is defined as whether calling `condition.__bool__()` returns `True`. Values like `True`, non-zero numbers, non-empty containers, etc. are all truthy, which is far more values than in C.
But you could imagine a dialect of Python that used the exact same syntax from C, but the semantics from Python. e.g., a language where
if (condition) {
doWhenTrue();
}
has the exact same meaning as the Python snippet above: that `doWhenTrue()` executes when `condition` is truthy, according to some internal `__bool__()` method.In pure functional languages saying what value an expression will evaluate to (equivalently, explaining the program as a function of its inputs) is a sufficient explanation of the meaning of a program, and semantics for these languages is roughly considered to be ‘solved’. Open areas of study in semantics tend to be more about doing the same thing for languages that have more complicated effects when run, like imperative state update or non-local control (exceptions, async, concurrency).
There's some overlap in study: typically syntax is trying to reflect semantics in some way, by proving that programs accepted by the syntactic analysis will behave or not behave a certain way when run. E.G. Rust's borrow checker is a syntactic check that the program under scrutiny will not dereference an invalid pointer, even though that's a thing that is possible by Rust's runtime semantics. Compare to Java, which has no syntactic check for this because dereferencing invalid pointers is simply impossible according to the semantics of the JVM.
Syntax is the smallest scale (words, punctuation, grammar), semantics is sentence or small function level, and pragmatics is paragraph-essay and program level.
For example when training early smaller scale LLMs they noticed that syntax was the first property for the LLM to reproduce reliably. They got proper grammar and punctuation but the sentences made no sense. When they scaled up they got semantics but not pragmatics. The sentences made sense but paragraphs didn't. Eventually the systems could output whole essays that made sense.
Even though you're asking about programming specifically, these concepts are universal to language, and are maybe a bit more intuitively applied to English (or your native language).
I suspect that a computer scientist could give a different mathematical explanation about how these concepts compile into binary or machine code in different ways, and I can't explain that. Generally I think of syntax as being very language specific but semantics and pragmatics can be translated across languages with similar capabilities.
The semantics is the definition of what is supposed to computationally happen if you execute a valid expression. It would state that the code block under the "then" will be executed if the condition attached to the "if" evaluated to true and skipped otherwise.
int c = 042;
printf("%d", c);
Semantics is what makes it print 34.Your intuition is right. It falls under what is called "Operational Semantics" (https://en.wikipedia.org/wiki/Operational_semantics) There are other aspects of looking at it i.e. "Denotational Semantics", "Axiomatic Semantics", "Algebraic Semantics" etc. which are more mathematical. The submitted book talks about all of them.
For more background, you might want to look at Alan Parkes book Introduction to Languages, Machines, and Logic.
The basic idea is that Symbolic Logic allows you to express Strings (sentences containing words) constructed from an Alphabet (set of symbols for that language) as "Programs" (which are valid i.e. meaningful strings in that language) which can then be interpreted by a Abstract Machine we call a Computer.
1. There's a lexer which breaks source text up into a stream of tokens.
2. A parser which converts a stream of tokens into an abstract syntax tree (AST).
3. An interpreter or compiler that traverses the AST in order to either execute it (interpreter) or transform it into some other form (compiler).
Points 1 & 2 are syntax - the mapping between the textual form of a program and its meaning.
Point 3 is semantics - how a program actually behaves, or as you say, what its terms evaluate to.
Looking at it like this can give a sharp line between syntax and semantics. But when you get deeper into it, it turns out that with some languages at least, you can get from source syntax to something that actually executes - has behavior, i.e. semantics - with nothing but a series of syntactic transformations. From that perspective, you can say that semantics can be defined as a sequence of syntactic transformations.
This doesn't erase the distinction between syntax and semantics, though. The syntax of the source language is the first stage in a chain of transformations, with each stage involving a different (albeit closely related) language with its own syntactic structure.
> Explanations that involve 'giving a precise mathematical meaning' just seem almost circular to me.
Formal semantics covers this, but the syntax/semantics distinction isn't necessary just formal - it's a useful distinction even in an informal sense.
As for circularity, it's absolutely the case that formal semantics is nothing more than defining one language in terms of another. But the idea is that the target language is supposed to be one with well-defined semantics of its own, which is why "mathematical" comes up so often in this context. Mathematical abstractions such as set theory, lattice theory, lambda calculus and so on can provide a rigorous foundation that's more tractable when it comes to proofs.
That kind of circularity pervades our knowledge in general. Words in the dictionary are given meaning in terms of other words. You can't explain something without explaining it in terms of something else.
This is similar to what Chomsky did in "Aspects of the Theory of Syntax" when he tried to show how we can build more thorough systems to evaluate semantics of (human) language, like what kinds of nouns can go with what kinds of verbs. He pushes the line of syntax further into the semantic territory and tries to create more comprehensive sets of rules that better guarantee the generation of syntactically and semantically correct sentences. I think this is perfectly analogous to the type-checking enterprise.
This certainly leads to a blurring of the distinction, but that's a result of the choice of this as a premise.
Parsing will give you an AST that tells you that a term has a type, say, `Int -> Bool`, which might be represented e.g. as a 3-node tree with the arrow at the root and the input and output types as leaves. But falling back to the conceptual definition of syntax, this tells you nothing about what that tree means.
To add type checking into the picture, we need another stage between 2 and 3, which is where meaning is assigned to types in the AST and the semantics of the types are handled.
You'll often see people saying things about how types are syntactic, but this is a slightly different use of the word: basically, it refers to how types categorize syntactic terms. So types apply to syntax, but their behavior when it comes to actual checking is still semantics - it involves applying logic, inference etc. that go well beyond syntactic analysis in the parsing sense.
Really, it boils down to what you choose as your definitions. If you define syntax to be, essentially, the kind of thing that's modeled by an AST, then there's not really any ambiguity, which makes it a good definition, I think. Semantics is then assigning meaning to the nodes and relationships in such a tree.
Re Chomsky, I think the discovery that PL semantics can be entirely implemented in terms of a series of syntactic transformation is quite relevant to what Chomsky was getting at. But that doesn't negate semantics, it just gives us a more precise model for semantics. In fact, I have a sneaking suspicion that this formally-supported view might help clarify Chomsky's work, but a bit more work would be needed to demonstrate that. :)
> Points 1 & 2 are syntax - the mapping between the textual form of a program and its structure.
https://www.amazon.com/-/es/Formal-Syntax-Semantics-Programm...
as the Amazon app wants to switch country and closes if I don't. (party pooper).
from the Preface:
Laboratory Activities
Chapter 2: Scanning and parsing Wren
Chapter 3: Context checking Wren using an attribute grammar
Chapter 4: Context checking Hollerith literals using a two-level grammar
Chapter 5: Evaluating the lambda calculus using its reduction rules
Chapter 6: Self-definition of Scheme (Lisp) Self-definition of Prolog
Chapter 7: Translating (compiling) Wren programs following an attribute grammar
Chapter 8: Interpreting the lambda calculus using the SECD machine Interpreting Wren according to a definition using structural operational semantics
Chapter 9: Interpreting Wren following a denotational specification
Chapter 10: Evaluating a lambda calculus that includes recursive defini- tions
Chapter 12: Interpreting Wren according to an algebraic specification of the language
Chapter 13: Translating Pelican programs into action notation following a specification in action semantics.
I had recommended this book earlier on HN and elsewhere. It uses Prolog as the meta language for language design. With Prolog finding new domains of usage with LLMs this makes it a good approach to learn both Prolog and language design.
Since it is out of print, snap up any and all used copies available ;-)
I don't know what it is, but it seems form my slowly blooming collection like the standard of writing in PLT books from that era was just really, really high.
I’m designing from the syntax first but have no idea/energy to actually implement the language/runtime or have it be usable with Godot etc.
If I make the syntax and someone likes it, how to get someone to help me make it? :)
Stating my sources of inspiration will win some people and turn off some people: In terms of syntax, Godot's GDScript comes closest to my ideal. In terms of features, Swift.
One thing I really want to try: instead of "functions" I want it to be "event handlers", so instead of
func doSomething()
you'd say on doSomething()
A class instance could "contain" or "own" another class instance at runtime, similar to Godot's node-based hierarchy, and sending the `update()` event/signal to an instance could propagate it down the tree, making it easy to implement an entity+components architecture.and you could overload handlers based on which type raised the event or emitted the signal:
on doSomething() by PlayerClass
`caller` would be a keyword/object just like `self` is, in every handler scope so you could see `if caller is Player` or Monster for example.Most relevant in terms of gameplay, I want `Stat` to be a core type: a number that has a maximum and minimum, and conditional buffs/debuffs affecting the final value, and so on.
For example, let's call it "environments" or "event scopes": If a game entity (class) is in a certain environment/scope, that environment can trap/intercept all events/signals sent to/from that entity.
I'm not yet sure how exactly it would work/look, but I feel like it could be useful for easily implementing complex RPG-like conditions like: "Hitting {purple} {orcs} with this sword on a {Tuesday} night with a {full moon} deals +42 damage"
So none of the base objects need to know about that condition, but the "environment/scope" could impose or "overlay" extra conditions on every event passing through it..