1、Chapter 3 Describing Syntax and Semantics,We usually break down the problem of defining a programming language into two parts. Defining the PLs syntax Defining the PLs semanticsSyntax - the form or structure of the expressions, statements, and program unitsSemantics - the meaning of the expressions,
2、 statements, and program units.Note: There is not always a clear boundary between the two.,Introduction,Why and How,Why? We want specifications for several communities: Other language designers Implementors Programmers (the users of the language)How? One ways is via natural language descriptions (e.
3、g., users manuals, text books) but there are a number of techniques for specifying the syntax and semantics that are more formal.,Syntax Overview,Language preliminaries Context-free grammars and BNF Syntax diagrams,A sentence is a string of characters over some alphabet. A language is a set of sente
4、nces. A lexeme is the lowest level syntactic unit of a language (e.g., *, sum, begin). A token is a category of lexemes (e.g., identifier). Formal approaches to describing syntax:1. Recognizers - used in compilers2. Generators - what well study,Introduction,Lexical Structure of Programming Languages
5、,The structure of its lexemes (words or tokens) token is a category of lexeme The scanning phase (lexical analyser) collects characters into tokens Parsing phase(syntactic analyser)determines syntactic structure,Stream ofcharacters,Result of parsing,tokens and values,lexical analyser,Syntactic analy
6、ser,Grammars,Context-Free Grammars Developed by Noam Chomsky in the mid-1950s. Language generators, meant to describe the syntax of natural languages. Define a class of languages called context-free languages.Backus Normal/Naur Form (1959) Invented by John Backus to describe Algol 58 and refined by
7、Peter Naur for Algol 60. BNF is equivalent to context-free grammars,A metalanguage is a language used to describe another language. In BNF, abstractions are used to represent classes of syntactic structures-they act like syntactic variables (also called nonterminal symbols), e.g.:= while do This is
8、a rule; it describes the structure of a while statement,BNF (continued),BNF,A rule has a left-hand side (LHS) which is a single non-terminal symbol and a right-hand side (RHS), one or more terminal or nonterminal symbols. A grammar is a finite nonempty set of rules A non-terminal symbol is “defined”
9、 by one or more rules. Multiple rules can be combined with the | symbol so that:= := ; And this rule are equivalent:= | ; ,Syntactic lists are described in BNF using recursion - ident| ident, A derivation is a repeated application of rules, starting with the start symbol and ending with a sentence (
10、all terminal symbols),BNF,BNF Example,Here is an example of a simple grammar for a subset of English. A sentence is noun phrase and verb phrase followed by a period.:= .:= := a | the:= man | apple | worm | penguin:= | := eats | throws | sees | is,Derivation using BNF, - .the.the man .the man .the ma
11、n eats .the man eats .the man eats the .the man eats the apple.,Another BNF Example, - - | ; - = - a | b | c | d- + | - - | const Here is a derivation:= = = = = a = = a = + = a = + = a = b + = a = b + const,Note: There is some variation in notation for BNF grammars. Here we are using - in the rules
12、instead of := .,Every string of symbols in the derivation is a sentential form. A sentence is a sentential form that has only terminal symbols. A leftmost derivation is one in which the leftmost nonterminal in each sentential form is the one that is expanded. A derivation may be neither leftmost nor
13、 rightmost (or something else),Derivation,Parse Tree,= a + constb,A parse tree is a hierarchical representation of a derivation,Another Parse Tree,A grammar is ambiguous iff it generates a sentential form that has two or more distinct parse trees. Ambiguous grammars are, in general, very undesirable
14、 in formal languages. We can eliminate ambiguity by revising the grammar.,Grammar,Grammar,Here is a simple grammar for expressions that is ambiguous- - int- +|-|*|/The sentence 1+2*3 can lead to two different parse trees corresponding to 1+(2*3) and (1+2)*3,If we use the parse tree to indicate prece
15、dence levels of the operators, we cannot have ambiguity An unambiguous expression grammar:- - | - / const | const- / constconst const,Grammar,Grammar (continued), = - = - = const - = const - / const= const - const / const,Operator associativity can also be indicated by a grammar- + | const (ambiguou
16、s)- + const | const (unambiguous)+ const+ constconst,An Expression Grammar,Heres a grammar to define simple arithmetic expressions over variables and numbers. Exp := numExp := idExp := UnOp ExpExp := Exp BinOp ExpExp := ( Exp )UnOp := +UnOp := -BinOp := + | - | * | /A parse tree for a+b*2: _Exp_/ |
17、Exp BinOp Exp_| | / | identifier + Exp BinOp Exp| | |identifier * number,Heres another common notation variant where single quotes are used to indicate terminal symbols and unquoted symbols are taken as non-terminals.,A derivation,Heres a derivation of a+b*2 using the expression grammar: Exp = / Exp
18、 := Exp BinOp Exp Exp BinOp Exp = / Exp := id id BinOp Exp = / BinOp := + id + Exp = / Exp := Exp BinOp Exp id + Exp BinOp Exp = / Exp := num id + Exp BinOp num = / Exp := id id + id BinOp num = / BinOp := * id + id * num a + b * 2,A parse tree,A parse tree for a+b*2: _Exp_/ | Exp BinOp Exp| | / | i
19、dentifier + Exp BinOp Exp| | |identifier * number,Precedence,Precedence refers to the order in which operations are evaluated. The convention is: exponents, mult div, add sub. Deal with operations in categories: exponents, mulops, addops. Heres a revised grammar that follows these conventions:Exp :=
20、 Exp AddOp Exp Exp := Term Term := Term MulOp Term Term := Factor Factor := ( + Exp + ) Factor := num | id AddOp := + | - MulOp := * | /,Associativity,Associativity refers to the order in which 2 of the same operation should be computed 3+4+5 = (3+4)+5, left associative (all BinOps) 345 = 3(45), rig
21、ht associative if x then if x then y else y = if x then (if x then y else y), else associates with closest unmatched if (matched if has an else) Adding associativity to the BinOp expression grammarExp := Exp AddOp TermExp := Term Term := Term MulOp FactorTerm := Factor Factor := ( Exp )Factor := num
22、 | idAddOp := + | -MulOp := * | /,Another example: conditionals,Goal: to create a correct grammar for conditionals. It needs to be non-ambiguous and the precedence is else with nearest unmatched if. Statement := Conditional | whateverConditional := if test then Statement else StatementConditional :=
23、 if test then Statement The grammar is ambiguous. The 1st Conditional allows unmatched ifs to be Conditionals. if test then (if test then whatever else whatever) = correctif test then (if test then whatever) else whatever = incorrect The final unambiguous grammar. Statement := Matched | UnmatchedMat
24、ched := if test then Matched else Matched | whateverUnmatched := if test then Statement| if test then Matched else Unmatched,Syntactic sugar: doesnt extend the expressive power of the formalism, but does make it easier to use. Optional parts are placed in brackets ()- ident ( ) Put alternative parts
25、 of RHSs in parentheses and separate them with vertical bars - (+ | -) const Put repetitions (0 or more) in braces ()- letter letter | digit,Extended BNF,BNF:- + | - | - * | / | EBNF:- (+ | -) - (* | /) ,BNF,Syntax Graphs,Syntax Graphs - Put the terminals in circles or ellipses and put the nontermin
26、als in rectangles; connect with lines with arrowheadse.g., Pascal type declarations,Parsing,A grammar describes the strings of tokens that are syntactically legal in a PL A recogniser simply accepts or rejects strings. A parser construct a derivation or parse tree. Two common types of parsers: botto
27、m-up or data driven top-down or hypothesis driven A recursive descent parser traces is a way to implement a top-down parser that is particularly simple.,Each nonterminal in the grammar has a subprogram associated with it; the subprogram parses all sentential forms that the nonterminal can generate T
28、he recursive descent parsing subprograms are built directly from the grammar rules Recursive descent parsers, like other top-down parsers, cannot be built from left-recursive grammars (why not?),Recursive Decent Parsing,Recursive Decent Parsing Example,Example: For the grammar:- (*|/)We could use th
29、e following recursive descent parsing subprogram (this one is written in C)void term() factor(); /* parse first factor*/while (next_token = ast_code | next_token = slash_code) lexical(); /* get next token */factor(); /* parse next factor */,Semantics,Semantics Overview,Syntax is about “form” and sem
30、antics about “meaning”. The boundary between syntax and semantics is not always clear. First well look at issues close to the syntax end, what Sebesta calls “static semantics”, and the technique of attribute grammars. Then well sketch three approaches to defining “deeper” semantics Operational seman
31、tics Axiomatic semantics Denotational semantics,Static semantics covers some language features that are difficult or impossible to handle in a BNF/CFG. It is also a mechanism for building a parser which produces a “abstract syntax tree” of its input. Categories attribute grammars can handle: Context
32、-free but cumbersome (e.g. type checking) Noncontext-free (e.g. variables must be declared before they are used),Static Semantics,Attribute Grammars,Attribute Grammars (AGs) (Knuth, 1968) CFGs cannot describe all of the syntax of programming languages Additions to CFGs to carry some “semantic” info
33、along through parse treesPrimary value of AGs: Static semantics specification Compiler design (static semantics checking),Attribute Grammar Example,In Ada we have the following rule to describe prodecure definitions:- procedure end ; But, of course, the name after “procedure” has to be the same as t
34、he name after “end”. This is not possible to capture in a CFG (in practice) because there are too many names. Solution: associate simple attributes with nodes in the parse tree and add a “semantic” rules or constraints to the syntactic rule in the grammar.- procedure 1 end 2 ; 2.string,Attribute Gra
35、mmars,Def: An attribute grammar is a CFG G=(S,N,T,P) with the following additions: For each grammar symbol x there is a set A(x) of attribute values. Each rule has a set of functions that define certain attributes of the nonterminals in the rule. Each rule has a (possibly empty) set of predicates to
36、 check for attribute consistency,Attribute Grammars,Let X0 - X1 . Xn be a rule.Functions of the form S(X0) = f(A(X1), . A(Xn) define synthesized attributesFunctions of the form I(Xj) = f(A(X0), . , A(Xn) for i = j = n define inherited attributesInitially, there are intrinsic attributes on the leaves
37、,Example: expressions of the form id + id ids can be either int_type or real_typetypes of the two ids must be the sametype of the expression must match its expected type BNF: - + - id Attributes:actual_type - synthesized for and expected_type - inherited for ,Attribute Grammars,Attribute Grammars,At
38、tribute Grammar:1. Syntax rule: - 1 + 2Semantic rules: .actual_type 1.actual_typePredicate: 1.actual_type = 2.actual_type.expected_type = .actual_type2. Syntax rule: - idSemantic rule:.actual_type lookup (id, ),How are attribute values computed? If all attributes were inherited, the tree could be de
39、corated in top-down order. If all attributes were synthesized, the tree could be decorated in bottom-up order. In many cases, both kinds of attributes are used, and it is some combination of top-down and bottom-up that must be used.,Attribute Grammars (continued),Attribute Grammars (continued),.expe
40、cted_type inherited from parent1.actual_type lookup (A, 1) 2.actual_type lookup (B, 2) 1.actual_type =? 2.actual_type.actual_type 1.actual_type .actual_type =? .expected_type,No single widely acceptable notation or formalism for describing semantics. The general approach to defining the semantics of
41、 any language L is to specify a general mechanism to translate any sentence in L into a set of sentences in another language or system that we take to be well defined. Here are three approaches well briefly look at: Operational semantics Axiomatic semantics Denotational semantics,Dynamic Semantics,O
42、perational Semantics,Idea: describe the meaning of a program in language L by specifying how statements effect the state of a machine, (simulated or actual) when executed. The change in the state of the machine (memory, registers, stack, heap, etc.) defines the meaning of the statement. Similar in s
43、pirit to the notion of a Turing Machine and also used informally to explain higher-level constructs in terms of simpler ones, as in: c statement operational semantics for(e1;e2;e3) e1; loop: if e2=0 goto exit e3; goto loop exit:,Operational Semantics,To use operational semantics for a high-level lan
44、guage, a virtual machine in needed A hardware pure interpreter would be too expensive A software pure interpreter also has problems: The detailed characteristics of the particular computer would make actions difficult to understand Such a semantic definition would be machine-dependent,Operational Se
45、mantics,A better alternative: A complete computer simulation Build a translator (translates source code to the machine code of an idealized computer) Build a simulator for the idealized computer Evaluation of operational semantics: Good if used informally Extremely complex if used formally (e.g. VDL
46、),Vienna Definition Language,VDL was a language developed at IBM Vienna Labs as a language for formal, algebraic definition via operational semantics. It was used to specify the semantics of PL/I. See: The Vienna Definition Language, P. Wegner, ACM Comp Surveys 4(1):5-63 (Mar 1972) The VDL specifica
47、tion of PL/I was very large, very complicated, a remarkable technical accomplishment, and of little practical use.,Axiomatic Semantics,Based on formal logic (first order predicate calculus) Original purpose: formal program verification Approach: Define axioms and inference rules in logic for each st
48、atement type in the language (to allow transformations of expressions to other expressions) The expressions are called assertions and are either Preconditions: An assertion before a statement states the relationships and constraints among variables that are true at that point in execution Postcondit
49、ions: An assertion following a statement,Logic 101,Propositional logic: Logical constants: true, false Propositional symbols: P, Q, S, . that are either true or false Logical connectives: (and) , (or), (implies), (is equivalent), (not) which are defined by the truth tables below. Sentences are forme
50、d by combining propositional symbols, connectives and parentheses and are either true or false. e.g.: PQ (P Q) First order logic adds Variables which can range over objects in the domain of discourse Quantifiers including: (forall) and (there exists) Example sentences: (p) (q) pq (p q) x prime(x) y prime(y) yx,