Syntax and Semantics of Programming Languages (1995)(homepage.cs.uiowa.edu)

48 pointsby nill0a day ago3 comments

anonymousDana day ago
I've always really struggled to understand the purpose of defining the 'semantics' of a programming language and how it differs from syntax. Explanations that involve 'giving a precise mathematical meaning' just seem almost circular to me. As I understand it now it's about saying what the value of a particular language construct should be (e.g. when evaluated), as opposed to whether the construct is allowed/part of the language (syntax). Is that intuition wrong?
- bananaflaga day ago
  I think the problem is that you don't get "syntax". What you think "syntax" is is actually semantics.
  "Syntax" just means what strings are valid programs. For example, x=3 should be valid in C, while =+x shouldn't. Note that this doesn't say anything about what x=3 actually means in practice. The fact that there is a variable called x, and that it has a value at some point in time, and that after the execution of x=3 that value becomes 3 are all semantics.
  - lifthrasiira day ago
    There does exist a subtle middle ground between proper syntax and proper semantics, namely well-formedness. Well-formedness is technically a part of syntax rules (i.e. syntactic correctness) but not a part of formal grammars and other similar stuffs, making it harder to classify. For example, XML's opening tag and closing tag should be matched like <foo></foo>, but this syntactic rule is not described in the formal context-free grammar. It is possible to make a formal context-sensitive grammar that only accepts a well-formed syntax, but that would make the specification unnecessarily complex, hence the introduction of informal rules. Some still may argue that it is actually kind of semantics, however.
  - griffzhowla day ago
    I think you misread what GP thinks syntax is. GP's definition
    "whether the construct is allowed/part of the language (syntax)"
    is the same as yours
    ""Syntax" just means what strings are valid programs"
  - lo_zamoyski16 hours ago
    > "Syntax" just means what strings are valid programs.
    Strings are not objects in the syntactic domain, only terms are. And parsers operate on tokens, recognizing valid compositions of tokens (at least syntactically). Syntax concerns relations between tokens and thus their form and composition.
    Semantics concerns the meanings of terms, where “meaning” is considered from various perspectives, like the denotational or operational (what the OP has in mind w.r.t. evaluation effectively concerns the denotational semantics). While computation is purely syntactic, denotational semantics is not circular, because the correspondence we assign (in our minds) to terms is with models that already possess a semantic content of their own.
- tmoertela day ago
  In short: Syntax defines the textual forms that the language allows. Semantics define how each form is interpreted.
  Consider a simple calculator language that lets you add positive integers.
  The syntax might give grammar rules like:
  expr ::= digits (EOF | ' + ' expr) digits ::= ('0' | '1' | ... | '9')+
  This grammar admits expressions like 33 and 2 + 88 + 344. That's syntax.
  But how the language interprets those expressions is semantics. Continuing our calculator example:
  1. Every expr evaluates to an integer value.
  2. An expr of the form `<digits> EOF` evaluates to the integer given by interpreting the digits in <digits> as an integer in base 10.
  3. If <expr> evaluates to the value x, then an expr of the form `<digits> + <expr>` evaluates to the value y + x, where y is the integer given by interpreting the digits in <digits> as an integer in base 10.
  Of course, specifying semantics in human language is tedious and hard to make precise. For this reason, most programming languages give their semantics in a more formalized language. See, for a classic example, the R5RS specs for Scheme:
  https://conservatory.scheme.org/schemers/Documents/Standards...
- jeza day ago
  > as opposed to whether the construct is allowed/part of the language
  Arguably this is also semantics. Type checking and reporting type errors decides whether a construct is allowed or not, yet belongs squarely in the semantic analysis phase of a language (as opposed to the syntactic analysis phase).
  > how it differs from syntax
  Consider a language like C, which allows code like this:
  if (condition) { doWhenTrue(); }
  And consider a language like Python, which allows code like this:
  if condition: doWhenTrue()
  The syntax and the semantics are both different here.
  The syntax is different: C requires parens around the condition, allows curly braces around the body, and requires `;` at the end of statements. Python allows but does not require parens around the condition, requires a `:`, requires indenting the body, and does not allow a `;` at the end of statements.
  Also, the semantics are different: in C, `doWhenTrue()` only executes if `condition` either is a non-zero integer, or can be implicitly coerced to a non-zero integer.
  In Python, `doWhenTrue` executes if `condition` is "truthy," which is defined as whether calling `condition.__bool__()` returns `True`. Values like `True`, non-zero numbers, non-empty containers, etc. are all truthy, which is far more values than in C.
  But you could imagine a dialect of Python that used the exact same syntax from C, but the semantics from Python. e.g., a language where
  if (condition) { doWhenTrue(); }
  has the exact same meaning as the Python snippet above: that `doWhenTrue()` executes when `condition` is truthy, according to some internal `__bool__()` method.
- Tweya day ago
  In addition to the other comments here, note that in PL circles ‘syntax’ typically denotes _everything_ that happens before translation/execution, importantly including type checking. ‘Semantics’ is then about explaining what happens when the program is run, which can equivalently be described as deciding when two programs are equal, mapping programs to mathematical objects (whose ‘meaning’, or at least equality, is considered to be well understood), specifying a set of transformations the syntax goes through, et cetera.
  In pure functional languages saying what value an expression will evaluate to (equivalently, explaining the program as a function of its inputs) is a sufficient explanation of the meaning of a program, and semantics for these languages is roughly considered to be ‘solved’. Open areas of study in semantics tend to be more about doing the same thing for languages that have more complicated effects when run, like imperative state update or non-local control (exceptions, async, concurrency).
  There's some overlap in study: typically syntax is trying to reflect semantics in some way, by proving that programs accepted by the syntactic analysis will behave or not behave a certain way when run. E.G. Rust's borrow checker is a syntactic check that the program under scrutiny will not dereference an invalid pointer, even though that's a thing that is possible by Rust's runtime semantics. Compare to Java, which has no syntactic check for this because dereferencing invalid pointers is simply impossible according to the semantics of the JVM.
- FloorEgga day ago
  Syntax, semantics, and pragmatics all define meaning at different scopes/scales.
  Syntax is the smallest scale (words, punctuation, grammar), semantics is sentence or small function level, and pragmatics is paragraph-essay and program level.
  For example when training early smaller scale LLMs they noticed that syntax was the first property for the LLM to reproduce reliably. They got proper grammar and punctuation but the sentences made no sense. When they scaled up they got semantics but not pragmatics. The sentences made sense but paragraphs didn't. Eventually the systems could output whole essays that made sense.
  Even though you're asking about programming specifically, these concepts are universal to language, and are maybe a bit more intuitively applied to English (or your native language).
  I suspect that a computer scientist could give a different mathematical explanation about how these concepts compile into binary or machine code in different ways, and I can't explain that. Generally I think of syntax as being very language specific but semantics and pragmatics can be translated across languages with similar capabilities.
- PeterStuera day ago
  The syntax is the set of rules that define what are valid expressions in the language. Valid as in the token sequence you wrote does not violate the grammar, nothing more that that. It can state that an "if" token must be followed by a "condition clause" that must be followed by "then" token etc.
  The semantics is the definition of what is supposed to computationally happen if you execute a valid expression. It would state that the code block under the "then" will be executed if the condition attached to the "if" evaluated to true and skipped otherwise.
- jceleriera day ago
  Syntax is being able to say whether this is a sequence of textual tokens such that we can prove that this a C program that compiles.
  int c = 042; printf("%d", c);
  Semantics is what makes it print 34.
- rramadassa day ago
  > As I understand it now it's about saying what the value of a particular language construct should be (e.g. when evaluated), as opposed to whether the construct is allowed/part of the language (syntax). Is that intuition wrong?
  Your intuition is right. It falls under what is called "Operational Semantics" (https://en.wikipedia.org/wiki/Operational_semantics) There are other aspects of looking at it i.e. "Denotational Semantics", "Axiomatic Semantics", "Algebraic Semantics" etc. which are more mathematical. The submitted book talks about all of them.
  For more background, you might want to look at Alan Parkes book Introduction to Languages, Machines, and Logic.
  The basic idea is that Symbolic Logic allows you to express Strings (sentences containing words) constructed from an Alphabet (set of symbols for that language) as "Programs" (which are valid i.e. meaningful strings in that language) which can then be interpreted by a Abstract Machine we call a Computer.
- antonvsa day ago
  Your intuition is on the right track. The distinction may become clearer if you consider a classic language implementation design:
  1. There's a lexer which breaks source text up into a stream of tokens.
  2. A parser which converts a stream of tokens into an abstract syntax tree (AST).
  3. An interpreter or compiler that traverses the AST in order to either execute it (interpreter) or transform it into some other form (compiler).
  Points 1 & 2 are syntax - the mapping between the textual form of a program and its meaning.
  Point 3 is semantics - how a program actually behaves, or as you say, what its terms evaluate to.
  Looking at it like this can give a sharp line between syntax and semantics. But when you get deeper into it, it turns out that with some languages at least, you can get from source syntax to something that actually executes - has behavior, i.e. semantics - with nothing but a series of syntactic transformations. From that perspective, you can say that semantics can be defined as a sequence of syntactic transformations.
  This doesn't erase the distinction between syntax and semantics, though. The syntax of the source language is the first stage in a chain of transformations, with each stage involving a different (albeit closely related) language with its own syntactic structure.
  > Explanations that involve 'giving a precise mathematical meaning' just seem almost circular to me.
  Formal semantics covers this, but the syntax/semantics distinction isn't necessary just formal - it's a useful distinction even in an informal sense.
  As for circularity, it's absolutely the case that formal semantics is nothing more than defining one language in terms of another. But the idea is that the target language is supposed to be one with well-defined semantics of its own, which is why "mathematical" comes up so often in this context. Mathematical abstractions such as set theory, lattice theory, lambda calculus and so on can provide a rigorous foundation that's more tractable when it comes to proofs.
  That kind of circularity pervades our knowledge in general. Words in the dictionary are given meaning in terms of other words. You can't explain something without explaining it in terms of something else.
  - adamddev1a day ago
    Great explanation. Also, if I understand things correctly, type-checking is where things get really interesting. Runtime errors occur in #3. Type checkers identify these errors (to varying degrees) and they show the errors in the compile phase. If we think of parsing and type-checking as a unit together, then type-checking sort of pushes the line of syntax further into semantic territory. The stronger the type-checker, the more you can make semantic errors look like syntax errors.
    This is similar to what Chomsky did in "Aspects of the Theory of Syntax" when he tried to show how we can build more thorough systems to evaluate semantics of (human) language, like what kinds of nouns can go with what kinds of verbs. He pushes the line of syntax further into the semantic territory and tries to create more comprehensive sets of rules that better guarantee the generation of syntactically and semantically correct sentences. I think this is perfectly analogous to the type-checking enterprise.
    antonvs16 hours ago
    > If we think of parsing and type-checking as a unit together
    This certainly leads to a blurring of the distinction, but that's a result of the choice of this as a premise.
    Parsing will give you an AST that tells you that a term has a type, say, `Int -> Bool`, which might be represented e.g. as a 3-node tree with the arrow at the root and the input and output types as leaves. But falling back to the conceptual definition of syntax, this tells you nothing about what that tree means.
    To add type checking into the picture, we need another stage between 2 and 3, which is where meaning is assigned to types in the AST and the semantics of the types are handled.
    You'll often see people saying things about how types are syntactic, but this is a slightly different use of the word: basically, it refers to how types categorize syntactic terms. So types apply to syntax, but their behavior when it comes to actual checking is still semantics - it involves applying logic, inference etc. that go well beyond syntactic analysis in the parsing sense.
    Really, it boils down to what you choose as your definitions. If you define syntax to be, essentially, the kind of thing that's modeled by an AST, then there's not really any ambiguity, which makes it a good definition, I think. Semantics is then assigning meaning to the nodes and relationships in such a tree.
    Re Chomsky, I think the discovery that PL semantics can be entirely implemented in terms of a series of syntactic transformation is quite relevant to what Chomsky was getting at. But that doesn't negate semantics, it just gives us a more precise model for semantics. In fact, I have a sneaking suspicion that this formally-supported view might help clarify Chomsky's work, but a bit more work would be needed to demonstrate that. :)
  - antonvs3 hours ago
    Correction:
    > Points 1 & 2 are syntax - the mapping between the textual form of a program and its structure.
ks20482 days ago
It looks like this is the following book, if you want to read a two paragraph description (also a google search you lead you to a full-book PDF):
https://www.amazon.com/-/es/Formal-Syntax-Semantics-Programm...
- froh2 days ago
  Slonneger, Kenneth, and Kurtz, Barry L.. Formal syntax and semantics of programming languages : a laboratory based approach. United Kingdom, Addison-Wesley Publishing Company, 1995.
  as the Amazon app wants to switch country and closes if I don't. (party pooper).
  from the Preface:
  Laboratory Activities
  Chapter 2: Scanning and parsing Wren
  Chapter 3: Context checking Wren using an attribute grammar
  Chapter 4: Context checking Hollerith literals using a two-level grammar
  Chapter 5: Evaluating the lambda calculus using its reduction rules
  Chapter 6: Self-definition of Scheme (Lisp) Self-definition of Prolog
  Chapter 7: Translating (compiling) Wren programs following an attribute grammar
  Chapter 8: Interpreting the lambda calculus using the SECD machine Interpreting Wren according to a definition using structural operational semantics
  Chapter 9: Interpreting Wren following a denotational specification
  Chapter 10: Evaluating a lambda calculus that includes recursive defini- tions
  Chapter 12: Interpreting Wren according to an algebraic specification of the language
  Chapter 13: Translating Pelican programs into action notation following a specification in action semantics.
  - rramadassa day ago
    Far more details can be found at author's old-school webpage (see Preface and Table of Contents without pdf) - https://homepage.cs.uiowa.edu/~slonnegr/
    I had recommended this book earlier on HN and elsewhere. It uses Prolog as the meta language for language design. With Prolog finding new domains of usage with LLMs this makes it a good approach to learn both Prolog and language design.
    Since it is out of print, snap up any and all used copies available ;-)
    iainctduncana day ago
    Thanks for the tip, I just did!
    I don't know what it is, but it seems form my slowly blooming collection like the standard of writing in PLT books from that era was just really, really high.
Razengana day ago
I’ve been braining about a hypothetical language made from the ground up to be ideal for coding gameplay logic, and not worrying about hardware.
I’m designing from the syntax first but have no idea/energy to actually implement the language/runtime or have it be usable with Godot etc.
If I make the syntax and someone likes it, how to get someone to help me make it? :)
- levzettelina day ago
  I'm probably not gonna do it, but just out of interest: What is the syntax?
  - Razengan21 hours ago
    I'll start a brainstorming/drafting repo once I'm confident in deciding what I want.
    Stating my sources of inspiration will win some people and turn off some people: In terms of syntax, Godot's GDScript comes closest to my ideal. In terms of features, Swift.
    One thing I really want to try: instead of "functions" I want it to be "event handlers", so instead of
    func doSomething()
    you'd say
    on doSomething()
    A class instance could "contain" or "own" another class instance at runtime, similar to Godot's node-based hierarchy, and sending the `update()` event/signal to an instance could propagate it down the tree, making it easy to implement an entity+components architecture.
    and you could overload handlers based on which type raised the event or emitted the signal:
    on doSomething() by PlayerClass
    `caller` would be a keyword/object just like `self` is, in every handler scope so you could see `if caller is Player` or Monster for example.
    Most relevant in terms of gameplay, I want `Stat` to be a core type: a number that has a maximum and minimum, and conditional buffs/debuffs affecting the final value, and so on.
    levzettelin12 hours ago
    Hmm, interesting! Sounds a bit like a DSL for ECS with tight integration of event handling.
    Razengan11 hours ago
    There's other out-there ideas that I don't remember seeing elsewhere and I don't know what made me think of them, but I definitely want to try:
    For example, let's call it "environments" or "event scopes": If a game entity (class) is in a certain environment/scope, that environment can trap/intercept all events/signals sent to/from that entity.
    I'm not yet sure how exactly it would work/look, but I feel like it could be useful for easily implementing complex RPG-like conditions like: "Hitting {purple} {orcs} with this sword on a {Tuesday} night with a {full moon} deals +42 damage"
    So none of the base objects need to know about that condition, but the "environment/scope" could impose or "overlay" extra conditions on every event passing through it..