What is the difference between syntax and semantics in programming languages?

TL; DR

In summary, syntax is the concept that concerns itself only whether or not the sentence is valid for the grammar of the language. Semantics is about whether or not the sentence has a valid meaning.

Long answer:

Syntax is about the structure or the grammar of the language. It answers the question: how do I construct a valid sentence? All languages, even English and other human (aka “natural”) languages have grammars, that is, rules that define whether or not the sentence is properly constructed.

Here are some C language syntax rules:

  • separate statements with a semi-colon
  • enclose the conditional expression of an IF statement inside parentheses
  • group multiple statements into a single statement by enclosing in curly braces
  • data types and variables must be declared before the first executable statement (this feature has been dropped in C99. C99 and latter allow mixed type declarations.)

Semantics is about the meaning of the sentence. It answers the questions: is this sentence valid? If so, what does the sentence mean? For example:

x++;                  // increment
foo(xyz, --b, &qrs);  // call foo

are syntactically valid C statements. But what do they mean? Is it even valid to attempt to transform these statements into an executable sequence of instructions? These questions are at the heart of semantics.

Consider the ++ operator in the first statement. First of all, is it even valid to attempt this?

  • If x is a float data type, this statement has no meaning (according to the C language rules) and thus it is an error even though the statement is syntactically correct.
  • If x is a pointer to some data type, the meaning of the statement is to “add sizeof(some data type) to the value at address x and store the result into the location at address x”.
  • If x is a scalar, the meaning of the statement is “add one to the value at address x and store the result into the location at address x”.

Finally, note that some semantics can not be determined at compile-time and therefore must be evaluated at run-time. In the ++ operator example, if x is already at the maximum value for its data type, what happens when you try to add 1 to it? Another example: what happens if your program attempts to dereference a pointer whose value is NULL?

Leave a Comment