The crs Compiler

crs is a compiler implemented in Rust for a C++-like language, built for the COMP 422 - Compiler Design course at Concordia University. Here's a sample program.

Getting Started

The nightly build of the Rust programming language is required to compile this project.

First, install rustup and then run rustup default nightly.

You can compile and run the entire project with the following command:

cargo run -- tests/lexer/source_example.crs

Documentation

The compiler documentation can be read as Markdown parsed by GitHub or as HTML rendered by mdBook.

All documentation is written in markdown in /docs/src. The HTML version can be generated with mdbook by running mdbook build in the root directory.

Testing

Unit Tests

Unit tests for the compiler application can be run with:

cargo test

Example output of unit tests

Integration Tests

Integration tests can be run using a simple bash script to query the output of commands:

./tests/lexer/lexer_integration_test.sh

These provide appropriate test cases that test for a wide variety of valid and invalid cases in the language.

Example output of integration tests

Component Implementation and Architecture Overview

The lexer is designed in a modular fashion in order to a clean separation of concerns.

Main Function

The main function's current responsibilities are to parse the command-line application's required arguments (the path to the source code file), ensure that the provided path exists, instantiate a single lexer for the entire duration of the lexical analysis, and call the nextToken function until the entire source code have been tokenized.

Lexer

The lexer's responsibility is to manage the progress of the lexical analysis of the source code. It buffers and streams the source code to be analyzed as tokens are extracted. It maintains state about the current index of the source code being accessed, and is effectively a data structure that is passed to the state transition functions. Every call to nextToken results in the instantiation of a new StateTransition object which handles all transitions between FSM states on character input until a valid token is identified.

State Transitions

The StateTransition module contains all information relating to transition internal state of the FSM on character input. It manages it's own internal state (the current and next states) as well as the lexer's state (which characters are being read at which index of the source code).

The state transition table is a static two-dimensional array that encodes all possible transitions between all valid states. This is referred to when new characters are being read, which ultimately updates the current state and the position in the source code. The table also encodes information as to whether the current state is an error, final, or backtrack state, and the module is responsible for taking the appropriate action in these cases, which could include generating a token, modifying the buffer for the extracted lexeme, and reporting encountered error states.

Token

The token is a simple data structure that stores the token's class and extracted lexeme as strings in a struct. Further improvements would be to restrict the token's class to a set of defined TokenClass enum structures which encode peculiarities about certain classes (for example, comment and newline tokens), as opposed to arbitrary text input.

Error Output

An ErrorType enumeration is defined in the output::error module to differentiate between recoverable and unrecoverable errors. It is also responsible for defining all possible sorts of errors, as well as outputting basic error information to standard output.

Source Line Output

The output::source_line module is able to analyze the information in the lexer data structure containing the source code and the current point of analysis, and print relevant lines and characters to standard output when errors occur.

Tools/Libraries/Techniques Chosen

  • The AtoCC RegExp and kfG Editor applications were used for converting the regular expressions identified as part of the lexical specification into DFAs, as well as ensuring the syntax CFG was converted to LL(1) for predictive parsing. Althought the tool was used in the constructed of the DFA shown above, the current application does not output the generated tokens as a file in the AtoCC format.
  • The FSM simulator website was used for a quick but less vigorous validation of the conversion of regular expressions to DFAs.
  • The lazy_static Rust library was used in order to statically store the large state transition table with values known at compile-time.
  • The Ropey library was used as a Rust library to buffer the source code using the rope data structure as it is being lexically analyzed, as well as keep track of lines and line numbers for when outputting errors. I was not able to find any other data structure that modeled the lexical analysis use case so appropriately.
  • The clap Rust library was used for simple command-line argument parsing. It was chosen given that it is mature and well-regarded in the Rust ecosystem.
  • The colored Rust library was used for prettier output during error reporting. It was chosen for it's ease of use and minimalism.

Example of using the FSM simulator website

crs Lexical Analysis

Lexical Specification

Deterministic Finite Automaton

The following list of regular expressions were used in order to define our language's mechanism for recognizing patterns when tokenizing. For brevity's sake, a variant on the POSIX standard for regular expressions is shown.

Pattern Label Simplified Expression Final Regular Expression
lettera..z¦A..Z[a-zA-Z]
digit 0..9[0-9]
nonzero 1..9[1-9]
alphanumeric letter ¦ digit ¦ _ [a-zA-Z]¦[0-9]¦_
identifier letter (alphanumeric)* [a-zA-Z]([a-zA-Z]¦[0-9]¦_)*
fraction .digit* nonzero ¦ .0 \.(([0-9]*[1-9])¦0)
integer nonzero digit* ¦ 0 (([1-9][0-9]*)¦0)
float integer fraction (e (+¦-)? integer)? (([1-9][0-9]*)¦0)\.(([0-9]*[1-9])¦0)(e(+¦−)?(([1-9][0-9]*)¦0))?

From the regular expressions for tokens defined above, as well as additional simpler language features, a deterministic finite automaton (DFA) was constructed. Tools such as AtoCC and FSM Simulator were used in order to confirm the validity of the regular expressions as finite state machines.

The FSM diagram was created using draw.io and can be modified by opening the lexer_dfa.xml file.

DFA of lexer

State Transition Table

The automaton implementation uses a static state transition table. Any change in the lexical specification above requires rebuilding this table, but should not impact the implementation of the compiler significantly.

The following legend defines some shortcuts for transition characters:

Transition LabelList of Valid CharactersRegular Expression
whitespace , \t, \r(\ +\t+\r)
newline \n \n
letter a..z and A..Z except for e (a+b+c+d+f+g+h+i+j+k+l+m+n+o+p+q+r+s+t+u+v+w+x+y+z)
+
(A+B+C+D+E+F+G+H+I+J+K+L+M+O+P+Q+R+S+T+U+V+W+X+Y+Z)
digit 0..9(0+1+2+3+4+5+6+7+8+9)
nonzero 1..9(1+2+3+4+5+6+7+8+9)

This state transition table was constructed based on the above DFA. The first column indicates the current state. The last few columns indicate any peculiarities of that state. The remaining columns represent all possible input characters which correspond to transition functions.

Current State whitespacenewlineletterenonzero 0 (){}+-*/%!&¦;><=_.:[] Final State Backtrack Required Error State Token Class
0000000000000000000000000000001
118224535121314152021942227232519283116051464950000
2332222333333333033333323333000
3111111111111111111111111111110<identifier,>
<keyword,>
4555555500005670000000000000000
5111111111111111111111111111110<math_operator,/>
6111111111111111111111111111100<open_multi_line_comment>
7111111111111111111111111111100<single_line_comment>
8111111111111111111111111111100<newline>
9111111111111110000110100000000000000000
10111111111111111111111111111100<close_multi_line_comment>
11111111111111111111111111111110<math_operator,*>
12111111111111111111111111111100<open_parens>
13111111111111111111111111111100<close_parens>
14111111111111111111111111111100<open_curly_brace>
15111111111111111111111111111100<close_curly_brace>
161818181818181800001801800000001700000000
17111111111111111111111111111100<relational_operator,==>
18111111111111111111111111111110<assignment_operator>
19111111111111111111111111111100<semicolon>
20111111111111111111111111111100<math_operator,+>
21111111111111111111111111111100<math_operator,->
22111111111111111111111111111100<math_operator,%>
230000000000000000240000000000000
24111111111111111111111111111100<logical_operator,and>
250000000000000000026000000000000
26111111111111111111111111111100<logical_operator,or>
27111111111111111111111111111100<logical_operator,not>
282929292929292900002902900000003000000000
29111111111111111111111111111110<relational_operator,>>
30111111111111111111111111111100<relational_operator,>=>
3134343434343434000034034000003203300000000
32111111111111111111111111111100<relational_operator,<>>
33111111111111111111111111111100<relational_operator,<=>
34111111111111111111111111111110<relational_operator,<>
35444400000444444444444444404444444444440360044000
3600003737000000000000000000000000
3743430393738043434343434343430434343434343000043000
3800003738000000000000000000000000
390000424100004040000000000000000000
4000004241000000000000000000000000
4143430000043434343434343430434343434343000043000
424343004242043434343434343430434343434343000043000
43111111111111111111111111111110<float,>
44111111111111111111111111111110<int,>
4544440045450444444444444444404444444444440360044000
4648484800000000004800000000004700000
47111111111111111111111111111100<scope_resolution_operator>
48111111111111111111111111111110<inheritance_operator>
49111111111111111111111111111100<open_square_bracket>
50111111111111111111111111111100<close_square_bracket>
51111111111111111111111111111100<accessor_operator,.>

Note that for rows which correspond to final states, the next transition function is never computed, given that arriving at a final state results in a return to the initial state (after backtracking if applicable). For simplicity's sake and in order to indicate a return to the start state, these rows are filled with 1.

Language Keywords

The current set of keywords available in the language are:

  • if
  • then
  • else
  • for
  • class
  • get
  • put
  • return
  • program
  • int
  • float
  • bool
  • and
  • or
  • not

If an <identifier,> token is found, it must first be compared to the above list where a match would actually result in a keyword token class, with the exception of and, or and not, which would result in a token class of logical_operator.

Lexical Errors

The three types of errors generated by the lexer are:

  • Unrecoverable Error (1): The input file path provided does not exist on the filesystem.
  • Recoverable Error (2): Input character is unrecognized.
  • Recoverable Error (3): Invalid syntax: error state reached. Resetting FSM to initial state.

Every undefined transition in the above DFA results in one of the two recoverable error types, as is demonstrated by the wide variety of integration tests.

All recoverable errors display the line number and line when tokenizing the source code.

Example of a recoverable error and the displayed line number

The panic mode technique is used for error recovery, which involves skipping invalid input characters until valid program segments are encountered. An error is reported to the user, but the lexical analysis and identification of tokens continues afterwards.

All error output is logged to both the console as well as the error.log file at the root of the repository whenever any lexical analysis occurs.

crs Syntatic Analysis

Context-Free Language Grammar Analysis

EBNF Grammar

The specification for the syntactic analysis of the language is shown with the productions below in EBNF syntax.

LHS RHS
prog {classDecl} {funcDef} program funcBody ;
classDecl class id [: id {, id}] { {varDecl} {funcDecl} } ;
funcDecl type id ( fParams ) ;
funcHead type [ id :: ] id ( fParams )
funcDef funcHead funcBody ;
funcBody { {varDecl} {statement} }
varDecl type id {arraySize} ;
statement assignStat |
if ( expr ) then statBlock else statBlock ; |
for ( type id assignOp expr ; relExpr ; assignStat ) statBlock ; |
get ( variable ) ; |
put ( expr ) ; |
return ( expr ) ;
assignStat variable assignOp expr
statBlock { {statement} } | statement | ε
expr arithExpr | relExpr
relExpr arithExpr relOp arithExpr
arithExpr arithExpr addOp term | term
sign + | -
term term multOp factor | factor
factor variable |
functionCall |
intNum |
floatNum |
( arithExpr ) |
(not | !) factor |
sign factor
variable {idnest} id {indice}
functionCall {idnest} id ( aParams )
idnest id {indice} . | id ( aParams ) .
indice [ arithExpr ]
arraySize [ intNum ]
type int | float | id
fParams type id {arraySize} {fParamsTail} | ε
aParams expr {aParamsTail} | ε
fParamsTail , type id {arraySize}
aParamsTail , expr
assignOp =
relOp == | <> | < | > | <= | >=
addOp + | - | or | ¦¦
multOp * | / | and | &&

BNF Grammar

An EBNF grammar can be converted to a BNF grammar by applying the following rules:

  • For every instance of {X}, extract it to a new nonterminal Y and add the production YY X | ε.
  • For every instance of [X], extract it to a new nonterminal Y and add the production YX | ε.
  • For every instance of (X), extract it to a new nonterminal Y and add the production YX.

The syntactic language specification in EBNF format was converted to a BNF grammar shown below.

LHS RHS
prog classDeclRecursion funcDefRecursion program funcBody ;
classDeclRecursion classDeclRecursion classDecl | ε
classDecl class id optionalInheritance { varDeclRecursion funcDeclRecursion } ;
optionalInheritance : id MultipleSuperClasses | ε
MultipleSuperClasses MultipleSuperClasses , id | ε
funcDeclRecursion funcDeclRecursion funcDecl | ε
funcDecl type id ( fParams ) ;
funcHead type optionalNamespace id ( fParams )
optionalNamespace id :: | ε
funcDefRecursion funcDefRecursion funcDef | ε
funcDef funcHead funcBody ;
funcBody { varDeclRecursion statementRecursion }
varDeclRecursion varDeclRecursion varDecl | ε
varDecl type id arraySizeRecursion ;
statementRecursion statementRecursion statement | ε
statement assignStat |
if ( expr ) then statBlock else statBlock ; |
for ( type id assignOp expr ; relExpr ; assignStat ) statBlock ; |
get ( variable ) ; |
put ( expr ) ; |
return ( expr ) ;
assignStat variable assignOp expr
statBlock { statementRecursion } | statement | ε
expr arithExpr | relExpr
relExpr arithExpr relOp arithExpr
arithExpr arithExpr addOp term | term
sign + | -
term term multOp factor | factor
factor variable |
functionCall |
intNum |
floatNum |
( arithExpr ) |
negationOperator factor |
sign factor
negationOperator not | !
variable idnestRecursion id indiceRecursion
functionCall idnestRecursion id ( aParams )
idnestRecursion idnestRecursion idnest | ε
idnest id indiceRecursion . | id ( aParams ) .
indiceRecursion indiceRecursion indice | ε
indice [ arithExpr ]
arraySizeRecursion arraySizeRecursion arraySize | ε
arraySize [ intNum ]
type int | float | id
fParams type id arraySizeRecursion fParamsTailRecursion | ε
aParams expr aParamsTailRecursion | ε
fParamsTailRecursion fParamsTailRecursion fParamsTail | ε
fParamsTail , type id arraySizeRecursion
aParamsTailRecursion aParamsTailRecursion aParamsTail | ε
aParamsTail , expr
assignOp =
relOp == | <> | < | > | <= | >=
addOp + | - | or | ¦¦
multOp * | / | and | &&

An AtoCC-compatible text format of the above grammar (with renamed variables) is shown below:

prog -> classDeclRecursion funcDefRecursion 'program' funcBody ';'
classDeclRecursion -> classDeclRecursion classDecl | EPSILON
classDecl -> 'class' 'id' optionalInheritance '{' varDeclRecursion funcDeclRecursion '}' ';'
optionalInheritance -> ':' 'id' multipleSuperClasses | EPSILON
multipleSuperClasses -> multipleSuperClasses ',' 'id' | EPSILON
funcDeclRecursion -> funcDeclRecursion funcDecl | EPSILON
funcDecl -> type 'id' '(' fParams ')' ';'
funcHead -> type optionalNamespace 'id' '(' fParams ')'
optionalNamespace ->  'id' '::' | EPSILON
funcDefRecursion -> funcDefRecursion funcDef | EPSILON
funcDef -> funcHead funcBody ';'
funcBody -> '{' varDeclRecursion statementRecursion '}'
varDeclRecursion -> varDeclRecursion varDecl | EPSILON
varDecl -> type 'id' arraySizeRecursion ';'
statementRecursion -> statementRecursion statement | EPSILON
statement -> assignStat | 'if' '(' expr ')' 'then' statBlock 'else' statBlock ';' | 'for' '(' type 'id' assignOp expr ';' relExpr ';' assignStat  ')' statBlock ';' | 'get' '(' variable ')' ';' | 'put' '(' expr ')' ';' | 'return' '(' expr ')' ';'
assignStat -> variable assignOp expr
statBlock -> '{' statementRecursion '}' | statement | EPSILON
expr -> arithExpr | relExpr
relExpr -> arithExpr relOp arithExpr
arithExpr -> arithExpr addOp term | term
sign -> '+' | '-'
term -> term multOp factor | factor
factor -> variable | functionCall | 'intNum' | 'floatNum' | '(' arithExpr ')' | negationOperator factor | sign factor
negationOperator -> 'not' | '!'
variable -> idnestRecursion 'id' indiceRecursion
functionCall -> idnestRecursion 'id' '(' aParams ')'
idnestRecursion -> idnestRecursion idnest | EPSILON
idnest -> 'id' indiceRecursion '.' | 'id' '(' aParams ')' '.'
indiceRecursion -> indiceRecursion indice | EPSILON
indice -> '[' arithExpr ']'
arraySizeRecursion -> arraySizeRecursion arraySize | EPSILON
arraySize -> '[' 'intNum' ']'
type -> 'int' | 'float' | 'id'
fParams -> type 'id' arraySizeRecursion fParamsTailRecursion | EPSILON
aParams -> expr aParamsTailRecursion | EPSILON
fParamsTailRecursion -> fParamsTailRecursion fParamsTail | EPSILON
fParamsTail -> ',' type 'id' arraySizeRecursion
aParamsTailRecursion -> aParamsTailRecursion aParamsTail | EPSILON
aParamsTail -> ',' expr
assignOp -> '='
relOp -> '==' | '<>' | '<' | '>' | '<=' | '>='
addOp -> '+' | '-' | 'or' | '¦¦'
multOp -> '*' | '/' | 'and' | '&&'

Left-Factored, Right-Recursive and LL(1) Grammar

Left factoring is a technique used in predictive top-down parsers avoid the need for backtracking or lookaheads during parsing, such as is done in recursive descent. It involves the removal of any common left factor (terminal or non-terminal) that appears in a production with an or clause (|), or effectively two productions of the same non-terminal. Performing left factoring means that at a given non-terminal, there is a clear deterministic choice of which production to proceed towards.

Left recursion is avoided in recursive descent and predictive parsing strategies due to the possibility of an infinite loop, resulting in the compiler never terminating and with no progress.

Left-recursive productions can be replaced with right-recursive productions by applying the following rules:

For any production with the general form...

SSα1 | ... | Sαn | β1 | ... | βm

... replace with two productions:

S → β1 T | ... | βm T
T → α1 T | ... | α1 T | ε

This technique needs to be applied and the condition must hold for all non-terminal substitutions (one derivation step of a production).

Ambiguity in context-free grammars means that it can result in multiple possible derivations or parse trees. This is undesirable as we want our compiler to reliably generate the same parse tree given the same input program.

Using an LL(1) grammar for a syntactic analysis is attractive given that LL(1) grammars are known to be unambigiuous. Converting a grammar to LL(1) is an effective technique for dealing with ambiguities in general, giving that determining if an arbitrary grammar is ambiguous is an undecidable problem.

An attempt to use the following left-factoring online tool, left-recusion elimination online tool and CFG-to-LL(k) online tool was done. However, these tools were error-prone, and superior results were obtained by manipulating the grammar by hand while verifying with the AtoCC kfGEdit tool along the way.

Ultimately, too many changes were made to the grammar to describe every ambiguity, left factoring or left-recursion elimination. The resulting grammar is shown below in an AtoCC-compatible format (with renamed non-terminal symbols to more accurately represent their role).

Program                                             -> ClassDeclarationRecursion FunctionDefinitionRecursion program FunctionBody ;
AdditiveOperator                                    -> +
AdditiveOperator                                    -> -
AdditiveOperator                                    -> or
AdditiveOperator                                    -> ¦¦
ArithmeticExpression                                -> Term ArithmeticExpressionTail
ArithmeticExpressionTail                            -> AdditiveOperator Term ArithmeticExpressionTail
ArithmeticExpressionTail                            -> EPSILON
ArithmeticOrRelationalExpression                    -> RelationalOperator ArithmeticExpression
ArithmeticOrRelationalExpression                    -> EPSILON
ArraySize                                           -> [ intNum ]
ArraySizeRecursion                                  -> ArraySize ArraySizeRecursion
ArraySizeRecursion                                  -> EPSILON
AssignmentStatement                                 -> Variable = Expression
ClassDeclaration                                    -> class id OptionalInheritanceList { VariableThenFunctionDeclarationRecursion } ;
ClassDeclarationRecursion                           -> ClassDeclaration ClassDeclarationRecursion
ClassDeclarationRecursion                           -> EPSILON
Expression                                          -> ArithmeticExpression ArithmeticOrRelationalExpression
Factor                                              -> ( ArithmeticExpression )
Factor                                              -> FunctionCallOrVariable
Factor                                              -> NegationOperator Factor
Factor                                              -> NumberSign Factor
Factor                                              -> floatNum
Factor                                              -> intNum
FunctionArguments                                   -> Expression FunctionArgumentsTailRecursion
FunctionArguments                                   -> EPSILON
FunctionArgumentsTail                               -> , Expression
FunctionArgumentsTailRecursion                      -> FunctionArgumentsTail FunctionArgumentsTailRecursion
FunctionArgumentsTailRecursion                      -> EPSILON
FunctionBody                                        -> { VariableDeclarationRecursionThenStatementRecursionA }
FunctionCallOrVariable                              -> id FunctionCallOrVariableTail
FunctionCallOrVariableTail                          -> FunctionCallParensOrIndexing FunctionCallOrVariableTailRecursion
FunctionCallOrVariableTailRecursion                 -> . id FunctionCallOrVariableTail
FunctionCallOrVariableTailRecursion                 -> EPSILON
FunctionCallParensOrIndexing                        -> ( FunctionArguments )
FunctionCallParensOrIndexing                        -> IndexingRecursion
FunctionDeclarationRecursionStart                   -> Type id FunctionDeclarationRecursionTail
FunctionDeclarationRecursionStart                   -> EPSILON
FunctionDeclarationRecursionTail                    -> ( FunctionParameters ) ; FunctionDeclarationRecursionStart
FunctionDefinition                                  -> FunctionHeader FunctionBody ;
FunctionDefinitionRecursion                         -> FunctionDefinition FunctionDefinitionRecursion
FunctionDefinitionRecursion                         -> EPSILON
FunctionHeader                                      -> Type OptionalNamespacing ( FunctionParameters )
FunctionParameters                                  -> Type id ArraySizeRecursion FunctionParametersTailRecursion
FunctionParameters                                  -> EPSILON
FunctionParametersTail                              -> , Type id ArraySizeRecursion
FunctionParametersTailRecursion                     -> FunctionParametersTail FunctionParametersTailRecursion
FunctionParametersTailRecursion                     -> EPSILON
IdListRecursion                                     -> , id IdListRecursion
IdListRecursion                                     -> EPSILON
Indexing                                            -> [ ArithmeticExpression ]
IndexingRecursion                                   -> Indexing IndexingRecursion
IndexingRecursion                                   -> EPSILON
MultiplicativeOperator                              -> &&
MultiplicativeOperator                              -> *
MultiplicativeOperator                              -> /
MultiplicativeOperator                              -> and
NegationOperator                                    -> !
NegationOperator                                    -> not
NumberSign                                          -> +
NumberSign                                          -> -
NumberType                                          -> float
NumberType                                          -> int
OptionalInheritanceList                             -> : id IdListRecursion
OptionalInheritanceList                             -> EPSILON
OptionalNamespacing                                 -> id OptionalNamespacingTail
OptionalNamespacingTail                             -> :: id
OptionalNamespacingTail                             -> EPSILON
RelationalExpression                                -> ArithmeticExpression RelationalOperator ArithmeticExpression
RelationalOperator                                  -> <>
RelationalOperator                                  -> <
RelationalOperator                                  -> <=
RelationalOperator                                  -> ==
RelationalOperator                                  -> >
RelationalOperator                                  -> >=
Statement                                           -> AssignmentStatement ;
Statement                                           -> StatementWithoutAssignment
StatementBlock                                      -> Statement
StatementBlock                                      -> EPSILON
StatementBlock                                      -> { StatementRecursion }
StatementRecursion                                  -> Statement StatementRecursion
StatementRecursion                                  -> EPSILON
StatementWithoutAssignment                          -> for ( Type id = Expression ; RelationalExpression ; AssignmentStatement ) StatementBlock ;
StatementWithoutAssignment                          -> get ( Variable ) ;
StatementWithoutAssignment                          -> if ( Expression ) then StatementBlock else StatementBlock ;
StatementWithoutAssignment                          -> put ( Expression ) ;
StatementWithoutAssignment                          -> return ( Expression ) ;
Term                                                -> Factor TermRecursion
TermRecursion                                       -> MultiplicativeOperator Factor TermRecursion
TermRecursion                                       -> EPSILON
Type                                                -> NumberType
Type                                                -> id
Variable                                            -> id VariableTail
VariableDeclarationRecursionThenStatementRecursionA -> NumberType id ArraySizeRecursion ; VariableDeclarationRecursionThenStatementRecursionA
VariableDeclarationRecursionThenStatementRecursionA -> EPSILON
VariableDeclarationRecursionThenStatementRecursionA -> StatementWithoutAssignment StatementRecursion
VariableDeclarationRecursionThenStatementRecursionA -> id VariableDeclarationRecursionThenStatementRecursionB
VariableDeclarationRecursionThenStatementRecursionB -> VariableTail = Expression ; StatementRecursion
VariableDeclarationRecursionThenStatementRecursionB -> id ArraySizeRecursion ; VariableDeclarationRecursionThenStatementRecursionA
VariableTail                                        -> ( FunctionArguments ) . id VariableTail
VariableTail                                        -> IndexingRecursion VariableTailTail
VariableTailTail                                    -> . id VariableTail
VariableTailTail                                    -> EPSILON
VariableThenFunctionDeclarationRecursion            -> Type id VariableThenFunctionDeclarationRecursionTail
VariableThenFunctionDeclarationRecursion            -> EPSILON
VariableThenFunctionDeclarationRecursionTail        -> ArraySizeRecursion ; VariableThenFunctionDeclarationRecursion
VariableThenFunctionDeclarationRecursionTail        -> FunctionDeclarationRecursionTail

The AtoCC kfG Edit tool confirms that the above grammar is LL(1).

AtoCC confirmation prompt indicating that the gramar satifies both LL(1) properties.

LL(1) Parse Table

A parsing table must be constructed from the above grammar in order to represent the grammar during predictive parsing. The information below was generated automatically using an online tool.

The fact that there was at most one production in each table cell further reinforces the fact that the grammar is LL(1).

FIRST Sets

Non-Terminal Symbol First Set
program program
; ;
+ +
- -
or or
¦¦ ¦¦
ε ε
[ [
intNum intNum
] ]
= =
class class
id id
{ }
} }
( )
) )
floatNum floatNum
, ,
. .
&& &&
* *
/ /
and and
! !
not not
float float
int int
: :
:: ::
<> <>
< <
<= <=
== ==
> >
>= >=
for for
get get
if if
then then
else else
put put
return return
Program program, ε, class, id, float, int
AdditiveOperator +, -, or, ¦¦
ArithmeticExpressionTail ε, +, -, or, ¦¦
ArithmeticOrRelationalExpression ε, <>, <, <=, ==, >, >=
ArraySize [
ArraySizeRecursion ε, [
ClassDeclaration class
ClassDeclarationRecursion ε, class
Factor (, floatNum, intNum, id, +, -, !, not
FunctionArguments ε, (, floatNum, intNum, id, +, -, !, not
FunctionArgumentsTail ,
FunctionArgumentsTailRecursion ε, ,
FunctionBody {
FunctionCallOrVariable id
FunctionCallOrVariableTailRecursion ., ε
FunctionCallParensOrIndexing (, ε, [
FunctionDeclarationRecursionStart ε, id, float, int
FunctionDeclarationRecursionTail (
FunctionDefinitionRecursion ε, id, float, int
FunctionParameters ε, id, float, int
FunctionParametersTail ,
FunctionParametersTailRecursion ε, ,
IdListRecursion ,, ε
Indexing [
IndexingRecursion ε, [
MultiplicativeOperator &&, *, /, and
NegationOperator !, not
NumberSign +, -
NumberType float, int
OptionalInheritanceList :, ε
OptionalNamespacing id
OptionalNamespacingTail ::, ε
RelationalOperator <>, <, <=, ==, >, >=
StatementBlock ε, {, id, for, get, if, put, return
StatementRecursion ε, id, for, get, if, put, return
StatementWithoutAssignment for, get, if, put, return
TermRecursion ε, &&, *, /, and
Type id, float, int
Variable id
VariableDeclarationRecursionThenStatementRecursionA ε, id, float, int, for, get, if, put, return
VariableDeclarationRecursionThenStatementRecursionB =, id, (, ε, [, .
VariableTail (, ε, [, .
VariableTailTail ., ε
VariableThenFunctionDeclarationRecursion ε, id, float, int
VariableThenFunctionDeclarationRecursionTail ;, ε, [, (
AssignmentStatement id
FunctionHeader id, float, int
Statement id, for, get, if, put, return
FunctionCallOrVariableTail (, ε, [, .
FunctionDefinition id, float, int
Term (, floatNum, intNum, id, +, -, !, not
ArithmeticExpression (, floatNum, intNum, id, +, -, !, not
Expression (, floatNum, intNum, id, +, -, !, not
RelationalExpression (, floatNum, intNum, id, +, -, !, not

FOLLOW Sets

Non-Terminal Symbol Follow Set
Program $
AdditiveOperator (, floatNum, intNum, id, +, -, !, not
ArithmeticExpression <>, <, <=, ==, >, >=, ], ), ;, ,
ArithmeticExpressionTail <>, <, <=, ==, >, >=, ], ), ;, ,
ArithmeticOrRelationalExpression ;, ), ,
ArraySize [, ;, ,, )
ArraySizeRecursion ;, ,, )
AssignmentStatement ), ;
ClassDeclaration class, program, id, float, int
ClassDeclarationRecursion program, id, float, int
Expression ;, ), ,
Factor &&, *, /, and, +, -, or, ¦¦, <>, <, <=, ==, >, >=, ], ), ;, ,
FunctionArguments )
FunctionArgumentsTail ,, )
FunctionArgumentsTailRecursion )
FunctionBody ;
FunctionCallOrVariable &&, *, /, and, +, -, or, ¦¦, <>, <, <=, ==, >, >=, ], ), ;, ,
FunctionCallOrVariableTail &&, *, /, and, +, -, or, ¦¦, <>, <, <=, ==, >, >=, ], ), ;, ,
FunctionCallOrVariableTailRecursion &&, *, /, and, +, -, or, ¦¦, <>, <, <=, ==, >, >=, ], ), ;, ,
FunctionCallParensOrIndexing ., &&, *, /, and, +, -, or, ¦¦, <>, <, <=, ==, >, >=, ], ), ;, ,
FunctionDeclarationRecursionStart }
FunctionDeclarationRecursionTail }
FunctionDefinition id, float, int, program
FunctionDefinitionRecursion program
FunctionHeader {
FunctionParameters )
FunctionParametersTail ,, )
FunctionParametersTailRecursion )
IdListRecursion {
Indexing [, ., =, &&, *, /, and, +, -, or, ¦¦, ), <>, <, <=, ==, >, >=, ], ;, ,
IndexingRecursion ., =, &&, *, /, and, +, -, or, ¦¦, ), <>, <, <=, ==, >, >=, ], ;, ,
MultiplicativeOperator (, floatNum, intNum, id, +, -, !, not
NegationOperator (, floatNum, intNum, id, +, -, !, not
NumberSign (, floatNum, intNum, id, +, -, !, not
NumberType id
OptionalInheritanceList {
OptionalNamespacing (
OptionalNamespacingTail (
RelationalExpression ;
RelationalOperator (, floatNum, intNum, id, +, -, !, not
Statement id, for, get, if, put, return, }, ;, else
StatementBlock ;, else
StatementRecursion }
StatementWithoutAssignment id, for, get, if, put, return, }, ;, else
Term +, -, or, ¦¦, <>, <, <=, ==, >, >=, ], ), ;, ,
TermRecursion +, -, or, ¦¦, <>, <, <=, ==, >, >=, ], ), ;, ,
Type id
Variable ), =
VariableDeclarationRecursionThenStatementRecursionA }
VariableDeclarationRecursionThenStatementRecursionB }
VariableTail =, )
VariableTailTail =, )
VariableThenFunctionDeclarationRecursion }
VariableThenFunctionDeclarationRecursionTail }

Prediction Table

Prediction Number Production LHS Non-Terminal Production RHS Expression Predict Set
1 Program ClassDeclarationRecursion FunctionDefinitionRecursion program FunctionBody ; class, id, float, int, program
2 AdditiveOperator + +
3 AdditiveOperator - -
4 AdditiveOperator or or
5 AdditiveOperator ¦¦ ¦¦
6 ArithmeticExpression Term ArithmeticExpressionTail (, floatNum, intNum, id, +, -, !, not
7 ArithmeticExpressionTail AdditiveOperator Term ArithmeticExpressionTail +, -, or, ¦¦
8 ArithmeticExpressionTail ε <>, <, <=, ==, >, >=, ], ), ;, ,
9 ArithmeticOrRelationalExpression RelationalOperator ArithmeticExpression <>, <, =, ==, >, >=
10 ArithmeticOrRelationalExpression ε ;, ), ,
11 ArraySize [ intNum ] [
12 ArraySizeRecursion ArraySize ArraySizeRecursion [
13 ArraySizeRecursion ε ;, ,, )
14 AssignmentStatement Variable = Expression id
15 ClassDeclaration class id OptionalInheritanceList { VariableThenFunctionDeclarationRecursion } ; class
16 ClassDeclarationRecursion ClassDeclaration ClassDeclarationRecursion class
17 ClassDeclarationRecursion ε program, id, float, int
18 Expression ArithmeticExpression ArithmeticOrRelationalExpression (, floatNum, intNum, id, +, -, !, not
19 Factor ( ArithmeticExpression ) (
20 Factor FunctionCallOrVariable id
21 Factor NegationOperator Factor !, not
22 Factor NumberSign Factor +, -
23 Factor floatNum floatNum
24 Factor intNum intNum
25 FunctionArguments Expression FunctionArgumentsTailRecursion (, floatNum, intNum, id, +, -, !, not
26 FunctionArguments ε )
27 FunctionArgumentsTail , Expression ,
28 FunctionArgumentsTailRecursion FunctionArgumentsTail FunctionArgumentsTailRecursion ,
29 FunctionArgumentsTailRecursion ε )
30 FunctionBody { VariableDeclarationRecursionThenStatementRecursionA } {
31 FunctionCallOrVariable id FunctionCallOrVariableTail id
32 FunctionCallOrVariableTail FunctionCallParensOrIndexing FunctionCallOrVariableTailRecursion (, [, ., =, &&, *, /, and, +, -, or, ¦¦, ), <>, <, <=, ==, >, >=, ], ;, ,
33 FunctionCallOrVariableTailRecursion . id FunctionCallOrVariableTail .
34 FunctionCallOrVariableTailRecursion ε &&, *, /, and, +, -, or, ¦¦, <>, <, <=, ==, >, >=, ], ), ;, ,
35 FunctionCallParensOrIndexing ( FunctionArguments ) (
36 FunctionCallParensOrIndexing IndexingRecursion [, ., =, &&, *, /, and, +, -, or, ¦¦, ), <>, <, <=, ==, >, >=, ], ;, ,
37 FunctionDeclarationRecursionStart Type id FunctionDeclarationRecursionTail id, float, int
38 FunctionDeclarationRecursionStart ε }
39 FunctionDeclarationRecursionTail ( FunctionParameters ) ; FunctionDeclarationRecursionStart (
40 FunctionDefinition FunctionHeader FunctionBody ; id, float, int
41 FunctionDefinitionRecursion FunctionDefinition FunctionDefinitionRecursion id, float, int
42 FunctionDefinitionRecursion ε program
43 FunctionHeader Type OptionalNamespacing ( FunctionParameters ) id, float, int
44 FunctionParameters Type id ArraySizeRecursion FunctionParametersTailRecursion id, float, int
45 FunctionParameters ε )
46 FunctionParametersTail , Type id ArraySizeRecursion ,
47 FunctionParametersTailRecursion FunctionParametersTail FunctionParametersTailRecursion ,
48 FunctionParametersTailRecursion ε )
49 IdListRecursion , id IdListRecursion ,
50 IdListRecursion ε {
51 Indexing [ ArithmeticExpression ] [
52 IndexingRecursion Indexing IndexingRecursion [
53 IndexingRecursion ε ., =, &&, *, /, and, +, -, or, ¦¦, ), <>, <, <=, ==, >, >=, ], ;, ,
54 MultiplicativeOperator && &&
55 MultiplicativeOperator * *
56 MultiplicativeOperator / /
57 MultiplicativeOperator and and
58 NegationOperator ! !
59 NegationOperator not not
60 NumberSign + +
61 NumberSign - -
62 NumberType float float
63 NumberType int int
64 OptionalInheritanceList : id IdListRecursion :
65 OptionalInheritanceList ε {
66 OptionalNamespacing id OptionalNamespacingTail id
67 OptionalNamespacingTail :: id ::
68 OptionalNamespacingTail ε (
69 RelationalExpression ArithmeticExpression RelationalOperator ArithmeticExpression (, floatNum, intNum, id, +, -, !, not
70 RelationalOperator <> <>
71 RelationalOperator < <
72 RelationalOperator <= <=
73 RelationalOperator == ==
74 RelationalOperator > >
75 RelationalOperator >= >=
76 Statement AssignmentStatement ; id
77 Statement StatementWithoutAssignment for, get, if, put, return
78 StatementBlock Statement id, for, get, if, put, return
79 StatementBlock ε ;, else
80 StatementBlock { StatementRecursion } {
81 StatementRecursion Statement StatementRecursion id, for, get, if, put, return
82 StatementRecursion ε }
83 StatementWithoutAssignment for ( Type id = Expression ; RelationalExpression ; AssignmentStatement ) StatementBlock ; for
84 StatementWithoutAssignment get ( Variable ) ; get
85 StatementWithoutAssignment if ( Expression ) then StatementBlock else StatementBlock ; if
86 StatementWithoutAssignment put ( Expression ) ; put
87 StatementWithoutAssignment return ( Expression ) ; return
88 Term Factor TermRecursion (, floatNum, intNum, id, +, -, !, not
89 TermRecursion MultiplicativeOperator Factor TermRecursion &&, *, /, and
90 TermRecursion ε +, -, or, ¦¦, <>, <, <=, ==, >, >=, ], ), ;, ,
91 Type NumberType float, int
92 Type id id
93 Variable id VariableTail id
94 VariableDeclarationRecursionThenStatementRecursionA NumberType id ArraySizeRecursion ; VariableDeclarationRecursionThenStatementRecursionA float, int
95 VariableDeclarationRecursionThenStatementRecursionA ε }
96 VariableDeclarationRecursionThenStatementRecursionA StatementWithoutAssignment StatementRecursion for, get, if, put, return
97 VariableDeclarationRecursionThenStatementRecursionA id VariableDeclarationRecursionThenStatementRecursionB id
98 VariableDeclarationRecursionThenStatementRecursionB VariableTail = Expression ; StatementRecursion (, [, ., =
99 VariableDeclarationRecursionThenStatementRecursionB id ArraySizeRecursion ; VariableDeclarationRecursionThenStatementRecursionA id
100 VariableTail ( FunctionArguments ) . id VariableTail (
101 VariableTail IndexingRecursion VariableTailTail [, ., =, )
102 VariableTailTail . id VariableTail .
103 VariableTailTail ε =, )
104 VariableThenFunctionDeclarationRecursion Type id VariableThenFunctionDeclarationRecursionTail id, float, int
105 VariableThenFunctionDeclarationRecursion ε }
106 VariableThenFunctionDeclarationRecursionTail ArraySizeRecursion ; VariableThenFunctionDeclarationRecursion [, ;
107 VariableThenFunctionDeclarationRecursionTail FunctionDeclarationRecursionTail (
108 POP ERROR POP ERROR POP ERROR POP ERROR
109 SCAN ERROR SCAN ERROR SCAN ERROR SCAN ERROR

LL(1) Parsing Table

The columns of the parsing table below represent the list of possible next input tokens.

The rows represent the current leftmost nonterminal in the parse tree.

Every table cell maps the indicates how the current non-terminal should be expanded based on the next input token. The numbers correspond to the rows of entries in the above prediction table, where the production right-hand side expression would be used to expand the current non-terminal.

program ; + - or ¦¦ [ intNum ] = class id { } ( ) floatNum , . && * / and ! not float int : :: <> < <= == > >= for get if then else put return $
Program 1 109 109 109 109 109 109 109 109 109 1 1 109 109 109 109 109 109 109 109 109 109 109 109 109 1 1 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 108
AdditiveOperator 109 109 2 3 4 5 109 108 109 109 109 108 109 109 108 109 108 109 109 109 109 109 109 108 108 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109
ArithmeticExpression 109 108 6 6 109 109 109 6 108 109 109 6 109 109 6 108 6 108 109 109 109 109 109 6 6 109 109 109 109 108 108 108 108 108 108 109 109 109 109 109 109 109 109
ArithmeticExpressionTail 109 8 7 7 7 7 109 109 8 109 109 109 109 109 109 8 109 8 109 109 109 109 109 109 109 109 109 109 109 8 8 8 8 8 8 109 109 109 109 109 109 109 109
ArithmeticOrRelationalExpression 109 10 109 109 109 109 109 109 109 109 109 109 109 109 109 10 109 10 109 109 109 109 109 109 109 109 109 109 109 9 9 9 9 9 9 109 109 109 109 109 109 109 109
ArraySize 109 108 109 109 109 109 11 109 109 109 109 109 109 109 109 108 109 108 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109
ArraySizeRecursion 109 13 109 109 109 109 12 109 109 109 109 109 109 109 109 13 109 13 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109
AssignmentStatement 109 108 109 109 109 109 109 109 109 109 109 14 109 109 109 108 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109
ClassDeclaration 108 109 109 109 109 109 109 109 109 109 15 108 109 109 109 109 109 109 109 109 109 109 109 109 109 108 108 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109
ClassDeclarationRecursion 17 109 109 109 109 109 109 109 109 109 16 17 109 109 109 109 109 109 109 109 109 109 109 109 109 17 17 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109
Expression 109 108 18 18 109 109 109 18 109 109 109 18 109 109 18 108 18 108 109 109 109 109 109 18 18 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109
Factor 109 108 22 22 108 108 109 24 108 109 109 20 109 109 19 108 23 108 109 108 108 108 108 21 21 109 109 109 109 108 108 108 108 108 108 109 109 109 109 109 109 109 109
FunctionArguments 109 109 25 25 109 109 109 25 109 109 109 25 109 109 25 26 25 109 109 109 109 109 109 25 25 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109
FunctionArgumentsTail 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 108 109 27 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109
FunctionArgumentsTailRecursion 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 29 109 28 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109
FunctionBody 109 108 109 109 109 109 109 109 109 109 109 109 30 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109
FunctionCallOrVariable 109 108 108 108 108 108 109 109 108 109 109 31 109 109 109 108 109 108 109 108 108 108 108 109 109 109 109 109 109 108 108 108 108 108 108 109 109 109 109 109 109 109 109
FunctionCallOrVariableTail 109 108 108 108 108 108 32 109 108 109 109 109 109 109 32 108 109 108 32 108 108 108 108 109 109 109 109 109 109 108 108 108 108 108 108 109 109 109 109 109 109 109 109
FunctionCallOrVariableTailRecursion 109 34 34 34 34 34 109 109 34 109 109 109 109 109 109 34 109 34 33 34 34 34 34 109 109 109 109 109 109 34 34 34 34 34 34 109 109 109 109 109 109 109 109
FunctionCallParensOrIndexing 109 108 108 108 108 108 36 109 108 109 109 109 109 109 35 108 109 108 108 108 108 108 108 109 109 109 109 109 109 108 108 108 108 108 108 109 109 109 109 109 109 109 109
FunctionDeclarationRecursionStart 109 109 109 109 109 109 109 109 109 109 109 37 109 38 109 109 109 109 109 109 109 109 109 109 109 37 37 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109
FunctionDeclarationRecursionTail 109 109 109 109 109 109 109 109 109 109 109 109 109 108 39 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109
FunctionDefinition 108 109 109 109 109 109 109 109 109 109 109 40 109 109 109 109 109 109 109 109 109 109 109 109 109 40 40 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109
FunctionDefinitionRecursion 42 109 109 109 109 109 109 109 109 109 109 41 109 109 109 109 109 109 109 109 109 109 109 109 109 41 41 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109
FunctionHeader 109 109 109 109 109 109 109 109 109 109 109 43 108 109 109 109 109 109 109 109 109 109 109 109 109 43 43 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109
FunctionParameters 109 109 109 109 109 109 109 109 109 109 109 44 109 109 109 45 109 109 109 109 109 109 109 109 109 44 44 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109
FunctionParametersTail 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 108 109 46 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109
FunctionParametersTailRecursion 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 48 109 47 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109
IdListRecursion 109 109 109 109 109 109 109 109 109 109 109 109 50 109 109 109 109 49 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109
Indexing 109 108 108 108 108 108 51 109 108 108 109 109 109 109 109 108 109 108 108 108 108 108 108 109 109 109 109 109 109 108 108 108 108 108 108 109 109 109 109 109 109 109 109
IndexingRecursion 109 53 53 53 53 53 52 109 53 53 109 109 109 109 109 53 109 53 53 53 53 53 53 109 109 109 109 109 109 53 53 53 53 53 53 109 109 109 109 109 109 109 109
MultiplicativeOperator 109 109 108 108 109 109 109 108 109 109 109 108 109 109 108 109 108 109 109 54 55 56 57 108 108 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109
NegationOperator 109 109 108 108 109 109 109 108 109 109 109 108 109 109 108 109 108 109 109 109 109 109 109 58 59 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109
NumberSign 109 109 60 61 109 109 109 108 109 109 109 108 109 109 108 109 108 109 109 109 109 109 109 108 108 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109
NumberType 109 109 109 109 109 109 109 109 109 109 109 108 109 109 109 109 109 109 109 109 109 109 109 109 109 62 63 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109
OptionalInheritanceList 109 109 109 109 109 109 109 109 109 109 109 109 65 109 109 109 109 109 109 109 109 109 109 109 109 109 109 64 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109
OptionalNamespacing 109 109 109 109 109 109 109 109 109 109 109 66 109 109 108 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109
OptionalNamespacingTail 109 109 109 109 109 109 109 109 109 109 109 109 109 109 68 109 109 109 109 109 109 109 109 109 109 109 109 109 67 109 109 109 109 109 109 109 109 109 109 109 109 109 109
RelationalExpression 109 108 69 69 109 109 109 69 109 109 109 69 109 109 69 109 69 109 109 109 109 109 109 69 69 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109
RelationalOperator 109 109 108 108 109 109 109 108 109 109 109 108 109 109 108 109 108 109 109 109 109 109 109 108 108 109 109 109 109 70 71 72 73 74 75 109 109 109 109 109 109 109 109
Statement 109 108 109 109 109 109 109 109 109 109 109 76 109 108 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 77 77 77 109 108 77 77 109
StatementBlock 109 79 109 109 109 109 109 109 109 109 109 78 80 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 78 78 78 109 79 78 78 109
StatementRecursion 109 109 109 109 109 109 109 109 109 109 109 81 109 82 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 81 81 81 109 109 81 81 109
StatementWithoutAssignment 109 108 109 109 109 109 109 109 109 109 109 108 109 108 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 83 84 85 109 108 86 87 109
Term 109 108 88 88 108 108 109 88 108 109 109 88 109 109 88 108 88 108 109 109 109 109 109 88 88 109 109 109 109 108 108 108 108 108 108 109 109 109 109 109 109 109 109
TermRecursion 109 90 90 90 90 90 109 109 90 109 109 109 109 109 109 90 109 90 109 89 89 89 89 109 109 109 109 109 109 90 90 90 90 90 90 109 109 109 109 109 109 109 109
Type 109 109 109 109 109 109 109 109 109 109 109 92 109 109 109 109 109 109 109 109 109 109 109 109 109 91 91 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109
Variable 109 109 109 109 109 109 109 109 109 108 109 93 109 109 109 108 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109
VariableDeclarationRecursionThenStatementRecursionA 109 109 109 109 109 109 109 109 109 109 109 97 109 95 109 109 109 109 109 109 109 109 109 109 109 94 94 109 109 109 109 109 109 109 109 96 96 96 109 109 96 96 109
VariableDeclarationRecursionThenStatementRecursionB 109 109 109 109 109 109 98 109 109 98 109 99 109 108 98 109 109 109 98 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109
VariableTail 109 109 109 109 109 109 101 109 109 108 109 109 109 109 100 108 109 109 101 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109
VariableTailTail 109 109 109 109 109 109 109 109 109 103 109 109 109 109 109 103 109 109 102 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109
VariableThenFunctionDeclarationRecursion 109 109 109 109 109 109 109 109 109 109 109 104 109 105 109 109 109 109 109 109 109 109 109 109 109 104 104 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109
VariableThenFunctionDeclarationRecursionTail 109 106 109 109 109 109 106 109 109 109 109 109 109 108 107 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109

List of Terminal Symbols (Tokens)

  • Identifier (id)
  • Keyword (program)
  • Keyword (class)
  • Keyword (if)
  • Keyword (then)
  • Keyword (for)
  • Keyword (get)
  • Keyword (put)
  • Keyword (return)
  • Keyword (float)
  • Keyword (int)
  • Semicolon (;)
  • OpenCurlyBrace ({)
  • CloseCurlyBrace (})
  • InheritanceOperator (:)
  • Comma (,)
  • OpenParens (()
  • CloseParens ())
  • ScopeResolutionOperator (::)
  • AssignmentOperator (=)
  • RelationalOperator
  • MathOperator (+)
  • MathOperator (-)
  • MathOperator (*)
  • MathOperator (/)
  • BinaryLogicalOperator (or)
  • BinaryLogicalOperator (||)
  • BinaryLogicalOperator (and)
  • BinaryLogicalOperator (&&)
  • UnaryLogicalOperator (not)
  • UnaryLogicalOperator (!)
  • Integer
  • Float
  • OpenSquareBracket ([)
  • CloseSquareBracket (])
  • AccessorOperator (.)

crs Semantic Analysis

Syntax-Directed Translation

In order to perform semantic analysis of our code as well as ultimately generate an abstract syntax tree (AST), the LL(1) context-free grammar derived during the parsing stage was transformed into an attribute grammar. This involved the addition of semantic actions to the right-hand sides of our grammar's productions, which ultimately represent the execution of a subroutine to accumulate and process semantic values or semantic attributes about the program being parsed. These values can have properties such as type, value, memory size or location, translated code, correctness, or more.

Semantic values are accumulated in semantic records until they can be processed by further semantic actions. However, attribute migration will be required given that these semantic rules and values will need to be available throughout the parsing step.

Semantic actions are associated with nodes within the parse tree:

  • Semantic actions at the leaves of the tree typically gather semantic values, from either the token type of the node, or through the type recorded in a symbol table for a token already seen
  • Semantic actions at intermediate nodes ultimately use and validate these semantic values and pass the information up the tree through attribute migration.

In table-driven parsers, semantic action symbols are inserted into the RHS of productions and pushed onto their own semantic stack. Executing a semantic action typically pops a semantic record from the semantic stack, does some processing, then pushes a semantic record back onto the stack.

AST Generation

The method for generating an AST was modified slightly...

  1. Undergo parsing until a semantic action is discovered as a PARSE symbol
  2. Pop the semantic action off the parsing stack
  3. Execute the function related to that semantic action
  • makeNode will:
    • define the AST node type
    • add the token to the node
    • create that node in the graph
    • store the node (or a reference to it) by pushing on the semantic stack
  • makeFamily will
    • define the AST node type
    • add the token to the node (if applicable)
    • create that node in the graph
    • pop X number of children nodes or their reference off the stack
    • Add edges from the new parent node to the children nodes in the graph
    • store the parent node (or a reference to it) by pushing on the semantic stack

Multiple semantic action types can end up using the same semantic action. This is to namespace specific types, since our way of generating the AST relies on pop the stack until a semantic action does not recognize that type anymore.

Semantic actions for leaf nodes are placed before the terminal parse symbol, in order to be able to observe the next token in the token queue before it is dequed. Semantic actions for intermediate nodes are placed after the non-terminal parse symbol, in order to ensure that all productions have already taken place, and that these types are all placed on the semantic stack before creating a family.

We're using the petgraph library to represent the AST. As opposed to referencing to individual nodes that we can keep on the stack as the graph grows, we're keeping track of nodes through unique indices that are assigned to nodes as they are added to a directed graph. This singleton mutex-locked graph object is mutably passed to semantic actions as it is being pushed onto the stack.

crs Semantic Analysis

Semantic Analysis

Symbol Tables

Symbol tables are a tool used during compilation for performing semantic checks on your program. These involve verifications that can only be expressed in a context-sentive grammar, such as type inteference, type checking, and much more

This compiler generates symbol tables by performing semantic actions when traversing the AST previously generated.

The implementation of a symbol table was done with a graph data structure. Some assumptions about the system are listed below:

  • Every node can have either a Table or a Record type.
  • A Table node can only have Record nodes as children.
  • A Record node can only have Table nodes as children.

Imposing these restrictions on our graph results in some nice properties.

  • Visiting all records in a table is done by visiting all of a Table node's immediate children (with a tree depth of 1).
  • Viewing all smallers scopes (nested symbol tables) can be done by ignoring all nodes that are leaves at 1 tree depth and visiting all tables that are at 2 tree depths.
  • Visiting a parent scope (symbol table) is done at any point by traversing 2 nodes up the tree from any Table node.
  • If a Table node is a leaf node, this necessarily represents an empty scope.

AST tree traversal with the Visitor pattern

The Visitor is a useful design pattern during AST tree traversal, in order to group semantic actions into one component during a specific phase of semantic analysis.

A more funcitonal approach to the Visitor pattern was used in this program, but the core premise remains the same.

A "visitor" in the form of a function callback was passed to the DFS tree-traversal algorith. This callback was executed at every node. Once the callback was executed, it executed specific visit functions based on the node type.