The crs
Compiler
crs
is a compiler implemented in Rust for a C++-like language, built for the COMP 422 - Compiler Design
course at Concordia University. Here's a sample program.
Getting Started
The nightly build of the Rust programming language is required to compile this project.
First, install rustup
and then run rustup default nightly
.
You can compile and run the entire project with the following command:
cargo run -- tests/lexer/source_example.crs
Documentation
The compiler documentation can be read as Markdown parsed by GitHub or as HTML rendered by mdBook.
All documentation is written in markdown in /docs/src
. The HTML version can be generated with mdbook
by running mdbook build
in the root directory.
Testing
Unit Tests
Unit tests for the compiler application can be run with:
cargo test
Integration Tests
Integration tests can be run using a simple bash script to query the output of commands:
./tests/lexer/lexer_integration_test.sh
These provide appropriate test cases that test for a wide variety of valid and invalid cases in the language.
Component Implementation and Architecture Overview
The lexer is designed in a modular fashion in order to a clean separation of concerns.
Main Function
The main function's current responsibilities are to parse the command-line application's required arguments (the path to the source code file), ensure that the provided path exists, instantiate a single lexer for the entire duration of the lexical analysis, and call the nextToken
function until the entire source code have been tokenized.
Lexer
The lexer's responsibility is to manage the progress of the lexical analysis of the source code. It buffers and streams the source code to be analyzed as tokens are extracted. It maintains state about the current index of the source code being accessed, and is effectively a data structure that is passed to the state transition functions. Every call to nextToken
results in the instantiation of a new StateTransition
object which handles all transitions between FSM states on character input until a valid token is identified.
State Transitions
The StateTransition
module contains all information relating to transition internal state of the FSM on character input. It manages it's own internal state (the current and next states) as well as the lexer's state (which characters are being read at which index of the source code).
The state transition table is a static two-dimensional array that encodes all possible transitions between all valid states. This is referred to when new characters are being read, which ultimately updates the current state and the position in the source code. The table also encodes information as to whether the current state is an error, final, or backtrack state, and the module is responsible for taking the appropriate action in these cases, which could include generating a token, modifying the buffer for the extracted lexeme, and reporting encountered error states.
Token
The token is a simple data structure that stores the token's class and extracted lexeme as strings in a struct. Further improvements would be to restrict the token's class to a set of defined TokenClass
enum
structures which encode peculiarities about certain classes (for example, comment and newline tokens), as opposed to arbitrary text input.
Error Output
An ErrorType
enumeration is defined in the output::error
module to differentiate between recoverable and unrecoverable errors. It is also responsible for defining all possible sorts of errors, as well as outputting basic error information to standard output.
Source Line Output
The output::source_line
module is able to analyze the information in the lexer data structure containing the source code and the current point of analysis, and print relevant lines and characters to standard output when errors occur.
Tools/Libraries/Techniques Chosen
- The AtoCC RegExp and kfG Editor applications were used for converting the regular expressions identified as part of the lexical specification into DFAs, as well as ensuring the syntax CFG was converted to LL(1) for predictive parsing. Althought the tool was used in the constructed of the DFA shown above, the current application does not output the generated tokens as a file in the AtoCC format.
- The FSM simulator website was used for a quick but less vigorous validation of the conversion of regular expressions to DFAs.
- The lazy_static Rust library was used in order to statically store the large state transition table with values known at compile-time.
- The Ropey library was used as a Rust library to buffer the source code using the rope data structure as it is being lexically analyzed, as well as keep track of lines and line numbers for when outputting errors. I was not able to find any other data structure that modeled the lexical analysis use case so appropriately.
- The clap Rust library was used for simple command-line argument parsing. It was chosen given that it is mature and well-regarded in the Rust ecosystem.
- The colored Rust library was used for prettier output during error reporting. It was chosen for it's ease of use and minimalism.
crs
Lexical Analysis
Lexical Specification
Deterministic Finite Automaton
The following list of regular expressions were used in order to define our language's mechanism for recognizing patterns when tokenizing. For brevity's sake, a variant on the POSIX standard for regular expressions is shown.
Pattern Label | Simplified Expression | Final Regular Expression |
---|---|---|
letter | a..z¦A..Z | [a-zA-Z] |
digit | 0..9 | [0-9] |
nonzero | 1..9 | [1-9] |
alphanumeric | letter ¦ digit ¦ _ | [a-zA-Z]¦[0-9]¦_ |
identifier | letter (alphanumeric)* | [a-zA-Z]([a-zA-Z]¦[0-9]¦_)* |
fraction | .digit* nonzero ¦ .0 | \.(([0-9]*[1-9])¦0) |
integer | nonzero digit* ¦ 0 | (([1-9][0-9]*)¦0) |
float | integer fraction (e (+¦-)? integer)? | (([1-9][0-9]*)¦0)\.(([0-9]*[1-9])¦0)(e(+¦−)?(([1-9][0-9]*)¦0))? |
From the regular expressions for tokens defined above, as well as additional simpler language features, a deterministic finite automaton (DFA) was constructed. Tools such as AtoCC and FSM Simulator were used in order to confirm the validity of the regular expressions as finite state machines.
The FSM diagram was created using draw.io and can be modified by opening the lexer_dfa.xml file.
State Transition Table
The automaton implementation uses a static state transition table. Any change in the lexical specification above requires rebuilding this table, but should not impact the implementation of the compiler significantly.
The following legend defines some shortcuts for transition characters:
Transition Label | List of Valid Characters | Regular Expression |
---|---|---|
whitespace | , \t , \r | (\ +\t+\r) |
newline | \n | \n |
letter | a..z and A..Z except for e | (a+b+c+d+f+g+h+i+j+k+l+m+n+o+p+q+r+s+t+u+v+w+x+y+z) + (A+B+C+D+E+F+G+H+I+J+K+L+M+O+P+Q+R+S+T+U+V+W+X+Y+Z) |
digit | 0..9 | (0+1+2+3+4+5+6+7+8+9) |
nonzero | 1..9 | (1+2+3+4+5+6+7+8+9) |
This state transition table was constructed based on the above DFA. The first column indicates the current state. The last few columns indicate any peculiarities of that state. The remaining columns represent all possible input characters which correspond to transition functions.
Current State | whitespace | newline | letter | e | nonzero | 0 | ( | ) | { | } | + | - | * | / | % | ! | & | ¦ | ; | > | < | = | _ | . | : | [ | ] | Final State | Backtrack Required | Error State | Token Class |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | |
1 | 1 | 8 | 2 | 2 | 45 | 35 | 12 | 13 | 14 | 15 | 20 | 21 | 9 | 4 | 22 | 27 | 23 | 25 | 19 | 28 | 31 | 16 | 0 | 51 | 46 | 49 | 50 | 0 | 0 | 0 | |
2 | 3 | 3 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 0 | 3 | 3 | 3 | 3 | 3 | 3 | 2 | 3 | 3 | 3 | 3 | 0 | 0 | 0 | |
3 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | <identifier,> <keyword,> |
4 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 0 | 0 | 0 | 0 | 5 | 6 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
5 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | <math_operator,/> |
6 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | <open_multi_line_comment> |
7 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | <single_line_comment> |
8 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | <newline> |
9 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 0 | 0 | 0 | 0 | 11 | 0 | 10 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
10 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | <close_multi_line_comment> |
11 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | <math_operator,*> |
12 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | <open_parens> |
13 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | <close_parens> |
14 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | <open_curly_brace> |
15 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | <close_curly_brace> |
16 | 18 | 18 | 18 | 18 | 18 | 18 | 18 | 0 | 0 | 0 | 0 | 18 | 0 | 18 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
17 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | <relational_operator,==> |
18 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | <assignment_operator> |
19 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | <semicolon> |
20 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | <math_operator,+> |
21 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | <math_operator,-> |
22 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | <math_operator,%> |
23 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 24 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
24 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | <logical_operator,and> |
25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 26 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
26 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | <logical_operator,or> |
27 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | <logical_operator,not> |
28 | 29 | 29 | 29 | 29 | 29 | 29 | 29 | 0 | 0 | 0 | 0 | 29 | 0 | 29 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 30 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
29 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | <relational_operator,>> |
30 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | <relational_operator,>=> |
31 | 34 | 34 | 34 | 34 | 34 | 34 | 34 | 0 | 0 | 0 | 0 | 34 | 0 | 34 | 0 | 0 | 0 | 0 | 0 | 32 | 0 | 33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
32 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | <relational_operator,<>> |
33 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | <relational_operator,<=> |
34 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | <relational_operator,<> |
35 | 44 | 44 | 0 | 0 | 0 | 0 | 0 | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 0 | 44 | 44 | 44 | 44 | 44 | 44 | 0 | 36 | 0 | 0 | 44 | 0 | 0 | 0 | |
36 | 0 | 0 | 0 | 0 | 37 | 37 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
37 | 43 | 43 | 0 | 39 | 37 | 38 | 0 | 43 | 43 | 43 | 43 | 43 | 43 | 43 | 43 | 0 | 43 | 43 | 43 | 43 | 43 | 43 | 0 | 0 | 0 | 0 | 43 | 0 | 0 | 0 | |
38 | 0 | 0 | 0 | 0 | 37 | 38 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
39 | 0 | 0 | 0 | 0 | 42 | 41 | 0 | 0 | 0 | 0 | 40 | 40 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
40 | 0 | 0 | 0 | 0 | 42 | 41 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
41 | 43 | 43 | 0 | 0 | 0 | 0 | 0 | 43 | 43 | 43 | 43 | 43 | 43 | 43 | 43 | 0 | 43 | 43 | 43 | 43 | 43 | 43 | 0 | 0 | 0 | 0 | 43 | 0 | 0 | 0 | |
42 | 43 | 43 | 0 | 0 | 42 | 42 | 0 | 43 | 43 | 43 | 43 | 43 | 43 | 43 | 43 | 0 | 43 | 43 | 43 | 43 | 43 | 43 | 0 | 0 | 0 | 0 | 43 | 0 | 0 | 0 | |
43 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | <float,> |
44 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | <int,> |
45 | 44 | 44 | 0 | 0 | 45 | 45 | 0 | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 0 | 44 | 44 | 44 | 44 | 44 | 44 | 0 | 36 | 0 | 0 | 44 | 0 | 0 | 0 | |
46 | 48 | 48 | 48 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 48 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 47 | 0 | 0 | 0 | 0 | 0 | |
47 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | <scope_resolution_operator> |
48 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | <inheritance_operator> |
49 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | <open_square_bracket> |
50 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | <close_square_bracket> |
51 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | <accessor_operator,.> |
Note that for rows which correspond to final states, the next transition function is never computed, given that arriving at a final state results in a return to the initial state (after backtracking if applicable). For simplicity's sake and in order to indicate a return to the start state, these rows are filled with 1
.
Language Keywords
The current set of keywords available in the language are:
if
then
else
for
class
get
put
return
program
int
float
bool
and
or
not
If an <identifier,>
token is found, it must first be compared to the above list where a match would actually result in a keyword
token class, with the exception of and
, or
and not
, which would result in a token class of logical_operator
.
Lexical Errors
The three types of errors generated by the lexer are:
- Unrecoverable Error (1):
The input file path provided does not exist on the filesystem.
- Recoverable Error (2):
Input character is unrecognized.
- Recoverable Error (3):
Invalid syntax: error state reached. Resetting FSM to initial state.
Every undefined transition in the above DFA results in one of the two recoverable error types, as is demonstrated by the wide variety of integration tests.
All recoverable errors display the line number and line when tokenizing the source code.
The panic mode technique is used for error recovery, which involves skipping invalid input characters until valid program segments are encountered. An error is reported to the user, but the lexical analysis and identification of tokens continues afterwards.
All error output is logged to both the console as well as the error.log
file at the root of the repository whenever any lexical analysis occurs.
crs
Syntatic Analysis
Context-Free Language Grammar Analysis
EBNF Grammar
The specification for the syntactic analysis of the language is shown with the productions below in EBNF syntax.
LHS | → | RHS |
---|---|---|
prog | → | {classDecl} {funcDef} program funcBody ; |
classDecl | → | class id [: id {, id }] { {varDecl} {funcDecl} } ; |
funcDecl | → | type id ( fParams ) ; |
funcHead | → | type [ id :: ] id ( fParams ) |
funcDef | → | funcHead funcBody ; |
funcBody | → | { {varDecl} {statement} } |
varDecl | → | type id {arraySize} ; |
statement | → | assignStat | if ( expr ) then statBlock else statBlock ; | for ( type id assignOp expr ; relExpr ; assignStat ) statBlock ; | get ( variable ) ; | put ( expr ) ; | return ( expr ) ; |
assignStat | → | variable assignOp expr |
statBlock | → | { {statement} } | statement | ε |
expr | → | arithExpr | relExpr |
relExpr | → | arithExpr relOp arithExpr |
arithExpr | → | arithExpr addOp term | term |
sign | → | + | - |
term | → | term multOp factor | factor |
factor | → | variable | functionCall | intNum | floatNum | ( arithExpr ) | ( not | ! ) factor | sign factor |
variable | → | {idnest} id {indice} |
functionCall | → | {idnest} id ( aParams ) |
idnest | → | id {indice} . | id ( aParams ) . |
indice | → | [ arithExpr ] |
arraySize | → | [ intNum ] |
type | → | int | float | id |
fParams | → | type id {arraySize} {fParamsTail} | ε |
aParams | → | expr {aParamsTail} | ε |
fParamsTail | → | , type id {arraySize} |
aParamsTail | → | , expr |
assignOp | → | = |
relOp | → | == | <> | < | > | <= | >= |
addOp | → | + | - | or | ¦¦ |
multOp | → | * | / | and | && |
BNF Grammar
An EBNF grammar can be converted to a BNF grammar by applying the following rules:
- For every instance of {X}, extract it to a new nonterminal Y and add the production Y → Y X | ε.
- For every instance of [X], extract it to a new nonterminal Y and add the production Y → X | ε.
- For every instance of (X), extract it to a new nonterminal Y and add the production Y → X.
The syntactic language specification in EBNF format was converted to a BNF grammar shown below.
LHS | → | RHS |
---|---|---|
prog | → | classDeclRecursion funcDefRecursion program funcBody ; |
classDeclRecursion | → | classDeclRecursion classDecl | ε |
classDecl | → | class id optionalInheritance { varDeclRecursion funcDeclRecursion } ; |
optionalInheritance | → | : id MultipleSuperClasses | ε |
MultipleSuperClasses | → | MultipleSuperClasses , id | ε |
funcDeclRecursion | → | funcDeclRecursion funcDecl | ε |
funcDecl | → | type id ( fParams ) ; |
funcHead | → | type optionalNamespace id ( fParams ) |
optionalNamespace | → | id :: | ε |
funcDefRecursion | → | funcDefRecursion funcDef | ε |
funcDef | → | funcHead funcBody ; |
funcBody | → | { varDeclRecursion statementRecursion } |
varDeclRecursion | → | varDeclRecursion varDecl | ε |
varDecl | → | type id arraySizeRecursion ; |
statementRecursion | → | statementRecursion statement | ε |
statement | → | assignStat | if ( expr ) then statBlock else statBlock ; | for ( type id assignOp expr ; relExpr ; assignStat ) statBlock ; | get ( variable ) ; | put ( expr ) ; | return ( expr ) ; |
assignStat | → | variable assignOp expr |
statBlock | → | { statementRecursion } | statement | ε |
expr | → | arithExpr | relExpr |
relExpr | → | arithExpr relOp arithExpr |
arithExpr | → | arithExpr addOp term | term |
sign | → | + | - |
term | → | term multOp factor | factor |
factor | → | variable | functionCall | intNum | floatNum | ( arithExpr ) | negationOperator factor | sign factor |
negationOperator | → | not | ! |
variable | → | idnestRecursion id indiceRecursion |
functionCall | → | idnestRecursion id ( aParams ) |
idnestRecursion | → | idnestRecursion idnest | ε |
idnest | → | id indiceRecursion . | id ( aParams ) . |
indiceRecursion | → | indiceRecursion indice | ε |
indice | → | [ arithExpr ] |
arraySizeRecursion | → | arraySizeRecursion arraySize | ε |
arraySize | → | [ intNum ] |
type | → | int | float | id |
fParams | → | type id arraySizeRecursion fParamsTailRecursion | ε |
aParams | → | expr aParamsTailRecursion | ε |
fParamsTailRecursion | → | fParamsTailRecursion fParamsTail | ε |
fParamsTail | → | , type id arraySizeRecursion |
aParamsTailRecursion | → | aParamsTailRecursion aParamsTail | ε |
aParamsTail | → | , expr |
assignOp | → | = |
relOp | → | == | <> | < | > | <= | >= |
addOp | → | + | - | or | ¦¦ |
multOp | → | * | / | and | && |
An AtoCC-compatible text format of the above grammar (with renamed variables) is shown below:
prog -> classDeclRecursion funcDefRecursion 'program' funcBody ';'
classDeclRecursion -> classDeclRecursion classDecl | EPSILON
classDecl -> 'class' 'id' optionalInheritance '{' varDeclRecursion funcDeclRecursion '}' ';'
optionalInheritance -> ':' 'id' multipleSuperClasses | EPSILON
multipleSuperClasses -> multipleSuperClasses ',' 'id' | EPSILON
funcDeclRecursion -> funcDeclRecursion funcDecl | EPSILON
funcDecl -> type 'id' '(' fParams ')' ';'
funcHead -> type optionalNamespace 'id' '(' fParams ')'
optionalNamespace -> 'id' '::' | EPSILON
funcDefRecursion -> funcDefRecursion funcDef | EPSILON
funcDef -> funcHead funcBody ';'
funcBody -> '{' varDeclRecursion statementRecursion '}'
varDeclRecursion -> varDeclRecursion varDecl | EPSILON
varDecl -> type 'id' arraySizeRecursion ';'
statementRecursion -> statementRecursion statement | EPSILON
statement -> assignStat | 'if' '(' expr ')' 'then' statBlock 'else' statBlock ';' | 'for' '(' type 'id' assignOp expr ';' relExpr ';' assignStat ')' statBlock ';' | 'get' '(' variable ')' ';' | 'put' '(' expr ')' ';' | 'return' '(' expr ')' ';'
assignStat -> variable assignOp expr
statBlock -> '{' statementRecursion '}' | statement | EPSILON
expr -> arithExpr | relExpr
relExpr -> arithExpr relOp arithExpr
arithExpr -> arithExpr addOp term | term
sign -> '+' | '-'
term -> term multOp factor | factor
factor -> variable | functionCall | 'intNum' | 'floatNum' | '(' arithExpr ')' | negationOperator factor | sign factor
negationOperator -> 'not' | '!'
variable -> idnestRecursion 'id' indiceRecursion
functionCall -> idnestRecursion 'id' '(' aParams ')'
idnestRecursion -> idnestRecursion idnest | EPSILON
idnest -> 'id' indiceRecursion '.' | 'id' '(' aParams ')' '.'
indiceRecursion -> indiceRecursion indice | EPSILON
indice -> '[' arithExpr ']'
arraySizeRecursion -> arraySizeRecursion arraySize | EPSILON
arraySize -> '[' 'intNum' ']'
type -> 'int' | 'float' | 'id'
fParams -> type 'id' arraySizeRecursion fParamsTailRecursion | EPSILON
aParams -> expr aParamsTailRecursion | EPSILON
fParamsTailRecursion -> fParamsTailRecursion fParamsTail | EPSILON
fParamsTail -> ',' type 'id' arraySizeRecursion
aParamsTailRecursion -> aParamsTailRecursion aParamsTail | EPSILON
aParamsTail -> ',' expr
assignOp -> '='
relOp -> '==' | '<>' | '<' | '>' | '<=' | '>='
addOp -> '+' | '-' | 'or' | '¦¦'
multOp -> '*' | '/' | 'and' | '&&'
Left-Factored, Right-Recursive and LL(1) Grammar
Left factoring is a technique used in predictive top-down parsers avoid the need for backtracking or lookaheads during parsing, such as is done in recursive descent. It involves the removal of any common left factor (terminal or non-terminal) that appears in a production with an or clause (|), or effectively two productions of the same non-terminal. Performing left factoring means that at a given non-terminal, there is a clear deterministic choice of which production to proceed towards.
Left recursion is avoided in recursive descent and predictive parsing strategies due to the possibility of an infinite loop, resulting in the compiler never terminating and with no progress.
Left-recursive productions can be replaced with right-recursive productions by applying the following rules:
For any production with the general form...
S → Sα1 | ... | Sαn | β1 | ... | βm
... replace with two productions:
S → β1 T | ... | βm T
T → α1 T | ... | α1 T | ε
This technique needs to be applied and the condition must hold for all non-terminal substitutions (one derivation step of a production).
Ambiguity in context-free grammars means that it can result in multiple possible derivations or parse trees. This is undesirable as we want our compiler to reliably generate the same parse tree given the same input program.
Using an LL(1) grammar for a syntactic analysis is attractive given that LL(1) grammars are known to be unambigiuous. Converting a grammar to LL(1) is an effective technique for dealing with ambiguities in general, giving that determining if an arbitrary grammar is ambiguous is an undecidable problem.
An attempt to use the following left-factoring online tool, left-recusion elimination online tool and CFG-to-LL(k) online tool was done. However, these tools were error-prone, and superior results were obtained by manipulating the grammar by hand while verifying with the AtoCC kfGEdit tool along the way.
Ultimately, too many changes were made to the grammar to describe every ambiguity, left factoring or left-recursion elimination. The resulting grammar is shown below in an AtoCC-compatible format (with renamed non-terminal symbols to more accurately represent their role).
Program -> ClassDeclarationRecursion FunctionDefinitionRecursion program FunctionBody ;
AdditiveOperator -> +
AdditiveOperator -> -
AdditiveOperator -> or
AdditiveOperator -> ¦¦
ArithmeticExpression -> Term ArithmeticExpressionTail
ArithmeticExpressionTail -> AdditiveOperator Term ArithmeticExpressionTail
ArithmeticExpressionTail -> EPSILON
ArithmeticOrRelationalExpression -> RelationalOperator ArithmeticExpression
ArithmeticOrRelationalExpression -> EPSILON
ArraySize -> [ intNum ]
ArraySizeRecursion -> ArraySize ArraySizeRecursion
ArraySizeRecursion -> EPSILON
AssignmentStatement -> Variable = Expression
ClassDeclaration -> class id OptionalInheritanceList { VariableThenFunctionDeclarationRecursion } ;
ClassDeclarationRecursion -> ClassDeclaration ClassDeclarationRecursion
ClassDeclarationRecursion -> EPSILON
Expression -> ArithmeticExpression ArithmeticOrRelationalExpression
Factor -> ( ArithmeticExpression )
Factor -> FunctionCallOrVariable
Factor -> NegationOperator Factor
Factor -> NumberSign Factor
Factor -> floatNum
Factor -> intNum
FunctionArguments -> Expression FunctionArgumentsTailRecursion
FunctionArguments -> EPSILON
FunctionArgumentsTail -> , Expression
FunctionArgumentsTailRecursion -> FunctionArgumentsTail FunctionArgumentsTailRecursion
FunctionArgumentsTailRecursion -> EPSILON
FunctionBody -> { VariableDeclarationRecursionThenStatementRecursionA }
FunctionCallOrVariable -> id FunctionCallOrVariableTail
FunctionCallOrVariableTail -> FunctionCallParensOrIndexing FunctionCallOrVariableTailRecursion
FunctionCallOrVariableTailRecursion -> . id FunctionCallOrVariableTail
FunctionCallOrVariableTailRecursion -> EPSILON
FunctionCallParensOrIndexing -> ( FunctionArguments )
FunctionCallParensOrIndexing -> IndexingRecursion
FunctionDeclarationRecursionStart -> Type id FunctionDeclarationRecursionTail
FunctionDeclarationRecursionStart -> EPSILON
FunctionDeclarationRecursionTail -> ( FunctionParameters ) ; FunctionDeclarationRecursionStart
FunctionDefinition -> FunctionHeader FunctionBody ;
FunctionDefinitionRecursion -> FunctionDefinition FunctionDefinitionRecursion
FunctionDefinitionRecursion -> EPSILON
FunctionHeader -> Type OptionalNamespacing ( FunctionParameters )
FunctionParameters -> Type id ArraySizeRecursion FunctionParametersTailRecursion
FunctionParameters -> EPSILON
FunctionParametersTail -> , Type id ArraySizeRecursion
FunctionParametersTailRecursion -> FunctionParametersTail FunctionParametersTailRecursion
FunctionParametersTailRecursion -> EPSILON
IdListRecursion -> , id IdListRecursion
IdListRecursion -> EPSILON
Indexing -> [ ArithmeticExpression ]
IndexingRecursion -> Indexing IndexingRecursion
IndexingRecursion -> EPSILON
MultiplicativeOperator -> &&
MultiplicativeOperator -> *
MultiplicativeOperator -> /
MultiplicativeOperator -> and
NegationOperator -> !
NegationOperator -> not
NumberSign -> +
NumberSign -> -
NumberType -> float
NumberType -> int
OptionalInheritanceList -> : id IdListRecursion
OptionalInheritanceList -> EPSILON
OptionalNamespacing -> id OptionalNamespacingTail
OptionalNamespacingTail -> :: id
OptionalNamespacingTail -> EPSILON
RelationalExpression -> ArithmeticExpression RelationalOperator ArithmeticExpression
RelationalOperator -> <>
RelationalOperator -> <
RelationalOperator -> <=
RelationalOperator -> ==
RelationalOperator -> >
RelationalOperator -> >=
Statement -> AssignmentStatement ;
Statement -> StatementWithoutAssignment
StatementBlock -> Statement
StatementBlock -> EPSILON
StatementBlock -> { StatementRecursion }
StatementRecursion -> Statement StatementRecursion
StatementRecursion -> EPSILON
StatementWithoutAssignment -> for ( Type id = Expression ; RelationalExpression ; AssignmentStatement ) StatementBlock ;
StatementWithoutAssignment -> get ( Variable ) ;
StatementWithoutAssignment -> if ( Expression ) then StatementBlock else StatementBlock ;
StatementWithoutAssignment -> put ( Expression ) ;
StatementWithoutAssignment -> return ( Expression ) ;
Term -> Factor TermRecursion
TermRecursion -> MultiplicativeOperator Factor TermRecursion
TermRecursion -> EPSILON
Type -> NumberType
Type -> id
Variable -> id VariableTail
VariableDeclarationRecursionThenStatementRecursionA -> NumberType id ArraySizeRecursion ; VariableDeclarationRecursionThenStatementRecursionA
VariableDeclarationRecursionThenStatementRecursionA -> EPSILON
VariableDeclarationRecursionThenStatementRecursionA -> StatementWithoutAssignment StatementRecursion
VariableDeclarationRecursionThenStatementRecursionA -> id VariableDeclarationRecursionThenStatementRecursionB
VariableDeclarationRecursionThenStatementRecursionB -> VariableTail = Expression ; StatementRecursion
VariableDeclarationRecursionThenStatementRecursionB -> id ArraySizeRecursion ; VariableDeclarationRecursionThenStatementRecursionA
VariableTail -> ( FunctionArguments ) . id VariableTail
VariableTail -> IndexingRecursion VariableTailTail
VariableTailTail -> . id VariableTail
VariableTailTail -> EPSILON
VariableThenFunctionDeclarationRecursion -> Type id VariableThenFunctionDeclarationRecursionTail
VariableThenFunctionDeclarationRecursion -> EPSILON
VariableThenFunctionDeclarationRecursionTail -> ArraySizeRecursion ; VariableThenFunctionDeclarationRecursion
VariableThenFunctionDeclarationRecursionTail -> FunctionDeclarationRecursionTail
The AtoCC kfG Edit tool confirms that the above grammar is LL(1).
LL(1) Parse Table
A parsing table must be constructed from the above grammar in order to represent the grammar during predictive parsing. The information below was generated automatically using an online tool.
The fact that there was at most one production in each table cell further reinforces the fact that the grammar is LL(1).
FIRST Sets
Non-Terminal Symbol | First Set |
---|---|
program | program |
; | ; |
+ | + |
- | - |
or | or |
¦¦ | ¦¦ |
ε | ε |
[ | [ |
intNum | intNum |
] | ] |
= | = |
class | class |
id | id |
{ | } |
} | } |
( | ) |
) | ) |
floatNum | floatNum |
, | , |
. | . |
&& | && |
* | * |
/ | / |
and | and |
! | ! |
not | not |
float | float |
int | int |
: | : |
:: | :: |
<> | <> |
< | < |
<= | <= |
== | == |
> | > |
>= | >= |
for | for |
get | get |
if | if |
then | then |
else | else |
put | put |
return | return |
Program | program , ε, class , id , float , int |
AdditiveOperator | + , - , or , ¦¦ |
ArithmeticExpressionTail | ε, + , - , or , ¦¦ |
ArithmeticOrRelationalExpression | ε, <> , < , <= , == , > , >= |
ArraySize | [ |
ArraySizeRecursion | ε, [ |
ClassDeclaration | class |
ClassDeclarationRecursion | ε, class |
Factor | ( , floatNum , intNum , id , + , - , ! , not |
FunctionArguments | ε, ( , floatNum , intNum , id , + , - , ! , not |
FunctionArgumentsTail | , |
FunctionArgumentsTailRecursion | ε, , |
FunctionBody | { |
FunctionCallOrVariable | id |
FunctionCallOrVariableTailRecursion | . , ε |
FunctionCallParensOrIndexing | ( , ε, [ |
FunctionDeclarationRecursionStart | ε, id , float , int |
FunctionDeclarationRecursionTail | ( |
FunctionDefinitionRecursion | ε, id , float , int |
FunctionParameters | ε, id , float , int |
FunctionParametersTail | , |
FunctionParametersTailRecursion | ε, , |
IdListRecursion | , , ε |
Indexing | [ |
IndexingRecursion | ε, [ |
MultiplicativeOperator | && , * , / , and |
NegationOperator | ! , not |
NumberSign | + , - |
NumberType | float , int |
OptionalInheritanceList | : , ε |
OptionalNamespacing | id |
OptionalNamespacingTail | :: , ε |
RelationalOperator | <> , < , <= , == , > , >= |
StatementBlock | ε, { , id , for , get , if , put , return |
StatementRecursion | ε, id , for , get , if , put , return |
StatementWithoutAssignment | for , get , if , put , return |
TermRecursion | ε, && , * , / , and |
Type | id , float , int |
Variable | id |
VariableDeclarationRecursionThenStatementRecursionA | ε, id , float , int , for , get , if , put , return |
VariableDeclarationRecursionThenStatementRecursionB | = , id , ( , ε, [ , . |
VariableTail | ( , ε, [ , . |
VariableTailTail | . , ε |
VariableThenFunctionDeclarationRecursion | ε, id , float , int |
VariableThenFunctionDeclarationRecursionTail | ; , ε, [ , ( |
AssignmentStatement | id |
FunctionHeader | id , float , int |
Statement | id , for , get , if , put , return |
FunctionCallOrVariableTail | ( , ε, [ , . |
FunctionDefinition | id , float , int |
Term | ( , floatNum , intNum , id , + , - , ! , not |
ArithmeticExpression | ( , floatNum , intNum , id , + , - , ! , not |
Expression | ( , floatNum , intNum , id , + , - , ! , not |
RelationalExpression | ( , floatNum , intNum , id , + , - , ! , not |
FOLLOW Sets
Non-Terminal Symbol | Follow Set |
---|---|
Program | $ |
AdditiveOperator | ( , floatNum , intNum , id , + , - , ! , not |
ArithmeticExpression | <> , < , <= , == , > , >= , ] , ) , ; , , |
ArithmeticExpressionTail | <> , < , <= , == , > , >= , ] , ) , ; , , |
ArithmeticOrRelationalExpression | ; , ) , , |
ArraySize | [ , ; , , , ) |
ArraySizeRecursion ; , , , ) | |
AssignmentStatement | ) , ; |
ClassDeclaration | class , program , id , float , int |
ClassDeclarationRecursion | program , id , float , int |
Expression | ; , ) , , |
Factor | && , * , / , and , + , - , or , ¦¦ , <> , < , <= , == , > , >= , ] , ) , ; , , |
FunctionArguments | ) |
FunctionArgumentsTail | , , ) |
FunctionArgumentsTailRecursion | ) |
FunctionBody | ; |
FunctionCallOrVariable | && , * , / , and , + , - , or , ¦¦ , <> , < , <= , == , > , >= , ] , ) , ; , , |
FunctionCallOrVariableTail | && , * , / , and , + , - , or , ¦¦ , <> , < , <= , == , > , >= , ] , ) , ; , , |
FunctionCallOrVariableTailRecursion | && , * , / , and , + , - , or , ¦¦ , <> , < , <= , == , > , >= , ] , ) , ; , , |
FunctionCallParensOrIndexing | . , && , * , / , and , + , - , or , ¦¦ , <> , < , <= , == , > , >= , ] , ) , ; , , |
FunctionDeclarationRecursionStart | } |
FunctionDeclarationRecursionTail | } |
FunctionDefinition | id , float , int , program |
FunctionDefinitionRecursion | program |
FunctionHeader | { |
FunctionParameters | ) |
FunctionParametersTail | , , ) |
FunctionParametersTailRecursion | ) |
IdListRecursion | { |
Indexing | [ , . , = , && , * , / , and , + , - , or , ¦¦ , ) , <> , < , <= , == , > , >= , ] , ; , , |
IndexingRecursion | . , = , && , * , / , and , + , - , or , ¦¦ , ) , <> , < , <= , == , > , >= , ] , ; , , |
MultiplicativeOperator | ( , floatNum , intNum , id , + , - , ! , not |
NegationOperator | ( , floatNum , intNum , id , + , - , ! , not |
NumberSign | ( , floatNum , intNum , id , + , - , ! , not |
NumberType | id |
OptionalInheritanceList | { |
OptionalNamespacing | ( |
OptionalNamespacingTail | ( |
RelationalExpression | ; |
RelationalOperator | ( , floatNum , intNum , id , + , - , ! , not |
Statement | id , for , get , if , put , return , } , ; , else |
StatementBlock | ; , else |
StatementRecursion | } |
StatementWithoutAssignment | id , for , get , if , put , return , } , ; , else |
Term | + , - , or , ¦¦ , <> , < , <= , == , > , >= , ] , ) , ; , , |
TermRecursion | + , - , or , ¦¦ , <> , < , <= , == , > , >= , ] , ) , ; , , |
Type | id |
Variable | ) , = |
VariableDeclarationRecursionThenStatementRecursionA | } |
VariableDeclarationRecursionThenStatementRecursionB | } |
VariableTail | = , ) |
VariableTailTail | = , ) |
VariableThenFunctionDeclarationRecursion | } |
VariableThenFunctionDeclarationRecursionTail | } |
Prediction Table
Prediction Number | Production LHS Non-Terminal | → | Production RHS Expression | Predict Set |
---|---|---|---|---|
1 | Program | → | ClassDeclarationRecursion FunctionDefinitionRecursion program FunctionBody ; | class , id , float , int , program |
2 | AdditiveOperator | → | + | + |
3 | AdditiveOperator | → | - | - |
4 | AdditiveOperator | → | or | or |
5 | AdditiveOperator | → | ¦¦ | ¦¦ |
6 | ArithmeticExpression | → | Term ArithmeticExpressionTail | ( , floatNum , intNum , id , + , - , ! , not |
7 | ArithmeticExpressionTail | → | AdditiveOperator Term ArithmeticExpressionTail | + , - , or , ¦¦ |
8 | ArithmeticExpressionTail | → | ε | <> , < , <= , == , > , >= , ] , ) , ; , , |
9 | ArithmeticOrRelationalExpression | → | RelationalOperator ArithmeticExpression | <> , < , = , == , > , >= |
10 | ArithmeticOrRelationalExpression | → | ε | ; , ) , , |
11 | ArraySize | → | [ intNum ] | [ |
12 | ArraySizeRecursion | → | ArraySize ArraySizeRecursion | [ |
13 | ArraySizeRecursion | → | ε | ; , , , ) |
14 | AssignmentStatement | → | Variable = Expression | id |
15 | ClassDeclaration | → | class id OptionalInheritanceList { VariableThenFunctionDeclarationRecursion } ; | class |
16 | ClassDeclarationRecursion | → | ClassDeclaration ClassDeclarationRecursion | class |
17 | ClassDeclarationRecursion | → | ε | program , id , float , int |
18 | Expression | → | ArithmeticExpression ArithmeticOrRelationalExpression | ( , floatNum , intNum , id , + , - , ! , not |
19 | Factor | → | ( ArithmeticExpression ) | ( |
20 | Factor | → | FunctionCallOrVariable | id |
21 | Factor | → | NegationOperator Factor | ! , not |
22 | Factor | → | NumberSign Factor | + , - |
23 | Factor | → | floatNum | floatNum |
24 | Factor | → | intNum | intNum |
25 | FunctionArguments | → | Expression FunctionArgumentsTailRecursion | ( , floatNum , intNum , id , + , - , ! , not |
26 | FunctionArguments | → | ε | ) |
27 | FunctionArgumentsTail | → | , Expression | , |
28 | FunctionArgumentsTailRecursion | → | FunctionArgumentsTail FunctionArgumentsTailRecursion | , |
29 | FunctionArgumentsTailRecursion | → | ε | ) |
30 | FunctionBody | → | { VariableDeclarationRecursionThenStatementRecursionA } | { |
31 | FunctionCallOrVariable | → | id FunctionCallOrVariableTail | id |
32 | FunctionCallOrVariableTail | → | FunctionCallParensOrIndexing FunctionCallOrVariableTailRecursion | ( , [ , . , = , && , * , / , and , + , - , or , ¦¦ , ) , <> , < , <= , == , > , >= , ] , ; , , |
33 | FunctionCallOrVariableTailRecursion | → | . id FunctionCallOrVariableTail | . |
34 | FunctionCallOrVariableTailRecursion | → | ε | && , * , / , and , + , - , or , ¦¦ , <> , < , <= , == , > , >= , ] , ) , ; , , |
35 | FunctionCallParensOrIndexing | → | ( FunctionArguments ) | ( |
36 | FunctionCallParensOrIndexing | → | IndexingRecursion | [ , . , = , && , * , / , and , + , - , or , ¦¦ , ) , <> , < , <= , == , > , >= , ] , ; , , |
37 | FunctionDeclarationRecursionStart | → | Type id FunctionDeclarationRecursionTail | id , float , int |
38 | FunctionDeclarationRecursionStart | → | ε | } |
39 | FunctionDeclarationRecursionTail | → | ( FunctionParameters ) ; FunctionDeclarationRecursionStart | ( |
40 | FunctionDefinition | → | FunctionHeader FunctionBody ; | id , float , int |
41 | FunctionDefinitionRecursion | → | FunctionDefinition FunctionDefinitionRecursion | id , float , int |
42 | FunctionDefinitionRecursion | → | ε | program |
43 | FunctionHeader | → | Type OptionalNamespacing ( FunctionParameters ) | id , float , int |
44 | FunctionParameters | → | Type id ArraySizeRecursion FunctionParametersTailRecursion | id , float , int |
45 | FunctionParameters | → | ε | ) |
46 | FunctionParametersTail | → | , Type id ArraySizeRecursion | , |
47 | FunctionParametersTailRecursion | → | FunctionParametersTail FunctionParametersTailRecursion | , |
48 | FunctionParametersTailRecursion | → | ε | ) |
49 | IdListRecursion | → | , id IdListRecursion | , |
50 | IdListRecursion | → | ε | { |
51 | Indexing | → | [ ArithmeticExpression ] | [ |
52 | IndexingRecursion | → | Indexing IndexingRecursion | [ |
53 | IndexingRecursion | → | ε | . , = , && , * , / , and , + , - , or , ¦¦ , ) , <> , < , <= , == , > , >= , ] , ; , , |
54 | MultiplicativeOperator | → | && | && |
55 | MultiplicativeOperator | → | * | * |
56 | MultiplicativeOperator | → | / | / |
57 | MultiplicativeOperator | → | and | and |
58 | NegationOperator | → | ! | ! |
59 | NegationOperator | → | not | not |
60 | NumberSign | → | + | + |
61 | NumberSign | → | - | - |
62 | NumberType | → | float | float |
63 | NumberType | → | int | int |
64 | OptionalInheritanceList | → | : id IdListRecursion | : |
65 | OptionalInheritanceList | → | ε | { |
66 | OptionalNamespacing | → | id OptionalNamespacingTail | id |
67 | OptionalNamespacingTail | → | :: id | :: |
68 | OptionalNamespacingTail | → | ε | ( |
69 | RelationalExpression | → | ArithmeticExpression RelationalOperator ArithmeticExpression | ( , floatNum , intNum , id , + , - , ! , not |
70 | RelationalOperator | → | <> | <> |
71 | RelationalOperator | → | < | < |
72 | RelationalOperator | → | <= | <= |
73 | RelationalOperator | → | == | == |
74 | RelationalOperator | → | > | > |
75 | RelationalOperator | → | >= | >= |
76 | Statement | → | AssignmentStatement ; | id |
77 | Statement | → | StatementWithoutAssignment | for , get , if , put , return |
78 | StatementBlock | → | Statement | id , for , get , if , put , return |
79 | StatementBlock | → | ε | ; , else |
80 | StatementBlock | → | { StatementRecursion } | { |
81 | StatementRecursion | → | Statement StatementRecursion | id , for , get , if , put , return |
82 | StatementRecursion | → | ε | } |
83 | StatementWithoutAssignment | → | for ( Type id = Expression ; RelationalExpression ; AssignmentStatement ) StatementBlock ; | for |
84 | StatementWithoutAssignment | → | get ( Variable ) ; | get |
85 | StatementWithoutAssignment | → | if ( Expression ) then StatementBlock else StatementBlock ; | if |
86 | StatementWithoutAssignment | → | put ( Expression ) ; | put |
87 | StatementWithoutAssignment | → | return ( Expression ) ; | return |
88 | Term | → | Factor TermRecursion | ( , floatNum , intNum , id , + , - , ! , not |
89 | TermRecursion | → | MultiplicativeOperator Factor TermRecursion | && , * , / , and |
90 | TermRecursion | → | ε | + , - , or , ¦¦ , <> , < , <= , == , > , >= , ] , ) , ; , , |
91 | Type | → | NumberType | float , int |
92 | Type | → | id | id |
93 | Variable | → | id VariableTail | id |
94 | VariableDeclarationRecursionThenStatementRecursionA | → | NumberType id ArraySizeRecursion ; VariableDeclarationRecursionThenStatementRecursionA | float , int |
95 | VariableDeclarationRecursionThenStatementRecursionA | → | ε | } |
96 | VariableDeclarationRecursionThenStatementRecursionA | → | StatementWithoutAssignment StatementRecursion | for , get , if , put , return |
97 | VariableDeclarationRecursionThenStatementRecursionA | → | id VariableDeclarationRecursionThenStatementRecursionB | id |
98 | VariableDeclarationRecursionThenStatementRecursionB | → | VariableTail = Expression ; StatementRecursion | ( , [ , . , = |
99 | VariableDeclarationRecursionThenStatementRecursionB | → | id ArraySizeRecursion ; VariableDeclarationRecursionThenStatementRecursionA | id |
100 | VariableTail | → | ( FunctionArguments ) . id VariableTail | ( |
101 | VariableTail | → | IndexingRecursion VariableTailTail | [ , . , = , ) |
102 | VariableTailTail | → | . id VariableTail | . |
103 | VariableTailTail | → | ε | = , ) |
104 | VariableThenFunctionDeclarationRecursion | → | Type id VariableThenFunctionDeclarationRecursionTail | id , float , int |
105 | VariableThenFunctionDeclarationRecursion | → | ε | } |
106 | VariableThenFunctionDeclarationRecursionTail | → | ArraySizeRecursion ; VariableThenFunctionDeclarationRecursion | [ , ; |
107 | VariableThenFunctionDeclarationRecursionTail | → | FunctionDeclarationRecursionTail | ( |
108 | POP ERROR | POP ERROR | POP ERROR | POP ERROR |
109 | SCAN ERROR | SCAN ERROR | SCAN ERROR | SCAN ERROR |
LL(1) Parsing Table
The columns of the parsing table below represent the list of possible next input tokens.
The rows represent the current leftmost nonterminal in the parse tree.
Every table cell maps the indicates how the current non-terminal should be expanded based on the next input token. The numbers correspond to the rows of entries in the above prediction table, where the production right-hand side expression would be used to expand the current non-terminal.
program | ; | + | - | or | ¦¦ | [ | intNum | ] | = | class | id | { | } | ( | ) | floatNum | , | . | && | * | / | and | ! | not | float | int | : | :: | <> | < | <= | == | > | >= | for | get | if | then | else | put | return | $ | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Program | 1 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 1 | 1 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 1 | 1 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 108 |
AdditiveOperator | 109 | 109 | 2 | 3 | 4 | 5 | 109 | 108 | 109 | 109 | 109 | 108 | 109 | 109 | 108 | 109 | 108 | 109 | 109 | 109 | 109 | 109 | 109 | 108 | 108 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
ArithmeticExpression | 109 | 108 | 6 | 6 | 109 | 109 | 109 | 6 | 108 | 109 | 109 | 6 | 109 | 109 | 6 | 108 | 6 | 108 | 109 | 109 | 109 | 109 | 109 | 6 | 6 | 109 | 109 | 109 | 109 | 108 | 108 | 108 | 108 | 108 | 108 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
ArithmeticExpressionTail | 109 | 8 | 7 | 7 | 7 | 7 | 109 | 109 | 8 | 109 | 109 | 109 | 109 | 109 | 109 | 8 | 109 | 8 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 8 | 8 | 8 | 8 | 8 | 8 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
ArithmeticOrRelationalExpression | 109 | 10 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 10 | 109 | 10 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 9 | 9 | 9 | 9 | 9 | 9 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
ArraySize | 109 | 108 | 109 | 109 | 109 | 109 | 11 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 108 | 109 | 108 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
ArraySizeRecursion | 109 | 13 | 109 | 109 | 109 | 109 | 12 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 13 | 109 | 13 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
AssignmentStatement | 109 | 108 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 14 | 109 | 109 | 109 | 108 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
ClassDeclaration | 108 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 15 | 108 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 108 | 108 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
ClassDeclarationRecursion | 17 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 16 | 17 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 17 | 17 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
Expression | 109 | 108 | 18 | 18 | 109 | 109 | 109 | 18 | 109 | 109 | 109 | 18 | 109 | 109 | 18 | 108 | 18 | 108 | 109 | 109 | 109 | 109 | 109 | 18 | 18 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
Factor | 109 | 108 | 22 | 22 | 108 | 108 | 109 | 24 | 108 | 109 | 109 | 20 | 109 | 109 | 19 | 108 | 23 | 108 | 109 | 108 | 108 | 108 | 108 | 21 | 21 | 109 | 109 | 109 | 109 | 108 | 108 | 108 | 108 | 108 | 108 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
FunctionArguments | 109 | 109 | 25 | 25 | 109 | 109 | 109 | 25 | 109 | 109 | 109 | 25 | 109 | 109 | 25 | 26 | 25 | 109 | 109 | 109 | 109 | 109 | 109 | 25 | 25 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
FunctionArgumentsTail | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 108 | 109 | 27 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
FunctionArgumentsTailRecursion | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 29 | 109 | 28 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
FunctionBody | 109 | 108 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 30 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
FunctionCallOrVariable | 109 | 108 | 108 | 108 | 108 | 108 | 109 | 109 | 108 | 109 | 109 | 31 | 109 | 109 | 109 | 108 | 109 | 108 | 109 | 108 | 108 | 108 | 108 | 109 | 109 | 109 | 109 | 109 | 109 | 108 | 108 | 108 | 108 | 108 | 108 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
FunctionCallOrVariableTail | 109 | 108 | 108 | 108 | 108 | 108 | 32 | 109 | 108 | 109 | 109 | 109 | 109 | 109 | 32 | 108 | 109 | 108 | 32 | 108 | 108 | 108 | 108 | 109 | 109 | 109 | 109 | 109 | 109 | 108 | 108 | 108 | 108 | 108 | 108 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
FunctionCallOrVariableTailRecursion | 109 | 34 | 34 | 34 | 34 | 34 | 109 | 109 | 34 | 109 | 109 | 109 | 109 | 109 | 109 | 34 | 109 | 34 | 33 | 34 | 34 | 34 | 34 | 109 | 109 | 109 | 109 | 109 | 109 | 34 | 34 | 34 | 34 | 34 | 34 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
FunctionCallParensOrIndexing | 109 | 108 | 108 | 108 | 108 | 108 | 36 | 109 | 108 | 109 | 109 | 109 | 109 | 109 | 35 | 108 | 109 | 108 | 108 | 108 | 108 | 108 | 108 | 109 | 109 | 109 | 109 | 109 | 109 | 108 | 108 | 108 | 108 | 108 | 108 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
FunctionDeclarationRecursionStart | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 37 | 109 | 38 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 37 | 37 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
FunctionDeclarationRecursionTail | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 108 | 39 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
FunctionDefinition | 108 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 40 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 40 | 40 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
FunctionDefinitionRecursion | 42 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 41 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 41 | 41 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
FunctionHeader | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 43 | 108 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 43 | 43 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
FunctionParameters | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 44 | 109 | 109 | 109 | 45 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 44 | 44 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
FunctionParametersTail | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 108 | 109 | 46 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
FunctionParametersTailRecursion | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 48 | 109 | 47 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
IdListRecursion | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 50 | 109 | 109 | 109 | 109 | 49 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
Indexing | 109 | 108 | 108 | 108 | 108 | 108 | 51 | 109 | 108 | 108 | 109 | 109 | 109 | 109 | 109 | 108 | 109 | 108 | 108 | 108 | 108 | 108 | 108 | 109 | 109 | 109 | 109 | 109 | 109 | 108 | 108 | 108 | 108 | 108 | 108 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
IndexingRecursion | 109 | 53 | 53 | 53 | 53 | 53 | 52 | 109 | 53 | 53 | 109 | 109 | 109 | 109 | 109 | 53 | 109 | 53 | 53 | 53 | 53 | 53 | 53 | 109 | 109 | 109 | 109 | 109 | 109 | 53 | 53 | 53 | 53 | 53 | 53 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
MultiplicativeOperator | 109 | 109 | 108 | 108 | 109 | 109 | 109 | 108 | 109 | 109 | 109 | 108 | 109 | 109 | 108 | 109 | 108 | 109 | 109 | 54 | 55 | 56 | 57 | 108 | 108 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
NegationOperator | 109 | 109 | 108 | 108 | 109 | 109 | 109 | 108 | 109 | 109 | 109 | 108 | 109 | 109 | 108 | 109 | 108 | 109 | 109 | 109 | 109 | 109 | 109 | 58 | 59 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
NumberSign | 109 | 109 | 60 | 61 | 109 | 109 | 109 | 108 | 109 | 109 | 109 | 108 | 109 | 109 | 108 | 109 | 108 | 109 | 109 | 109 | 109 | 109 | 109 | 108 | 108 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
NumberType | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 108 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 62 | 63 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
OptionalInheritanceList | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 65 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 64 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
OptionalNamespacing | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 66 | 109 | 109 | 108 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
OptionalNamespacingTail | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 68 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 67 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
RelationalExpression | 109 | 108 | 69 | 69 | 109 | 109 | 109 | 69 | 109 | 109 | 109 | 69 | 109 | 109 | 69 | 109 | 69 | 109 | 109 | 109 | 109 | 109 | 109 | 69 | 69 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
RelationalOperator | 109 | 109 | 108 | 108 | 109 | 109 | 109 | 108 | 109 | 109 | 109 | 108 | 109 | 109 | 108 | 109 | 108 | 109 | 109 | 109 | 109 | 109 | 109 | 108 | 108 | 109 | 109 | 109 | 109 | 70 | 71 | 72 | 73 | 74 | 75 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
Statement | 109 | 108 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 76 | 109 | 108 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 77 | 77 | 77 | 109 | 108 | 77 | 77 | 109 |
StatementBlock | 109 | 79 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 78 | 80 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 78 | 78 | 78 | 109 | 79 | 78 | 78 | 109 |
StatementRecursion | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 81 | 109 | 82 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 81 | 81 | 81 | 109 | 109 | 81 | 81 | 109 |
StatementWithoutAssignment | 109 | 108 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 108 | 109 | 108 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 83 | 84 | 85 | 109 | 108 | 86 | 87 | 109 |
Term | 109 | 108 | 88 | 88 | 108 | 108 | 109 | 88 | 108 | 109 | 109 | 88 | 109 | 109 | 88 | 108 | 88 | 108 | 109 | 109 | 109 | 109 | 109 | 88 | 88 | 109 | 109 | 109 | 109 | 108 | 108 | 108 | 108 | 108 | 108 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
TermRecursion | 109 | 90 | 90 | 90 | 90 | 90 | 109 | 109 | 90 | 109 | 109 | 109 | 109 | 109 | 109 | 90 | 109 | 90 | 109 | 89 | 89 | 89 | 89 | 109 | 109 | 109 | 109 | 109 | 109 | 90 | 90 | 90 | 90 | 90 | 90 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
Type | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 92 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 91 | 91 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
Variable | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 108 | 109 | 93 | 109 | 109 | 109 | 108 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
VariableDeclarationRecursionThenStatementRecursionA | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 97 | 109 | 95 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 94 | 94 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 96 | 96 | 96 | 109 | 109 | 96 | 96 | 109 |
VariableDeclarationRecursionThenStatementRecursionB | 109 | 109 | 109 | 109 | 109 | 109 | 98 | 109 | 109 | 98 | 109 | 99 | 109 | 108 | 98 | 109 | 109 | 109 | 98 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
VariableTail | 109 | 109 | 109 | 109 | 109 | 109 | 101 | 109 | 109 | 108 | 109 | 109 | 109 | 109 | 100 | 108 | 109 | 109 | 101 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
VariableTailTail | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 103 | 109 | 109 | 109 | 109 | 109 | 103 | 109 | 109 | 102 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
VariableThenFunctionDeclarationRecursion | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 104 | 109 | 105 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 104 | 104 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
VariableThenFunctionDeclarationRecursionTail | 109 | 106 | 109 | 109 | 109 | 109 | 106 | 109 | 109 | 109 | 109 | 109 | 109 | 108 | 107 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
List of Terminal Symbols (Tokens)
- Identifier (
id
) - Keyword (
program
) - Keyword (
class
) - Keyword (
if
) - Keyword (
then
) - Keyword (
for
) - Keyword (
get
) - Keyword (
put
) - Keyword (
return
) - Keyword (
float
) - Keyword (
int
) - Semicolon (
;
) - OpenCurlyBrace (
{
) - CloseCurlyBrace (
}
) - InheritanceOperator (
:
) - Comma (
,
) - OpenParens (
(
) - CloseParens (
)
) - ScopeResolutionOperator (
::
) - AssignmentOperator (
=
) - RelationalOperator
- MathOperator (
+
) - MathOperator (
-
) - MathOperator (
*
) - MathOperator (
/
) - BinaryLogicalOperator (
or
) - BinaryLogicalOperator (
||
) - BinaryLogicalOperator (
and
) - BinaryLogicalOperator (
&&
) - UnaryLogicalOperator (
not
) - UnaryLogicalOperator (
!
) - Integer
- Float
- OpenSquareBracket (
[
) - CloseSquareBracket (
]
) - AccessorOperator (
.
)
crs
Semantic Analysis
Syntax-Directed Translation
In order to perform semantic analysis of our code as well as ultimately generate an abstract syntax tree (AST), the LL(1) context-free grammar derived during the parsing stage was transformed into an attribute grammar. This involved the addition of semantic actions to the right-hand sides of our grammar's productions, which ultimately represent the execution of a subroutine to accumulate and process semantic values or semantic attributes about the program being parsed. These values can have properties such as type, value, memory size or location, translated code, correctness, or more.
Semantic values are accumulated in semantic records until they can be processed by further semantic actions. However, attribute migration will be required given that these semantic rules and values will need to be available throughout the parsing step.
Semantic actions are associated with nodes within the parse tree:
- Semantic actions at the leaves of the tree typically gather semantic values, from either the token type of the node, or through the type recorded in a symbol table for a token already seen
- Semantic actions at intermediate nodes ultimately use and validate these semantic values and pass the information up the tree through attribute migration.
In table-driven parsers, semantic action symbols are inserted into the RHS of productions and pushed onto their own semantic stack. Executing a semantic action typically pops a semantic record from the semantic stack, does some processing, then pushes a semantic record back onto the stack.
AST Generation
The method for generating an AST was modified slightly...
- Undergo parsing until a semantic action is discovered as a PARSE symbol
- Pop the semantic action off the parsing stack
- Execute the function related to that semantic action
makeNode
will:- define the AST node type
- add the token to the node
- create that node in the graph
- store the node (or a reference to it) by pushing on the semantic stack
makeFamily
will- define the AST node type
- add the token to the node (if applicable)
- create that node in the graph
- pop X number of children nodes or their reference off the stack
- Add edges from the new parent node to the children nodes in the graph
- store the parent node (or a reference to it) by pushing on the semantic stack
Multiple semantic action types can end up using the same semantic action. This is to namespace specific types, since our way of generating the AST relies on pop the stack until a semantic action does not recognize that type anymore.
Semantic actions for leaf nodes are placed before the terminal parse symbol, in order to be able to observe the next token in the token queue before it is dequed. Semantic actions for intermediate nodes are placed after the non-terminal parse symbol, in order to ensure that all productions have already taken place, and that these types are all placed on the semantic stack before creating a family.
We're using the petgraph
library to represent the AST. As opposed to referencing to individual nodes that we can keep on the stack as the graph grows, we're keeping track of nodes through unique indices that are assigned to nodes as they are added to a directed graph. This singleton mutex-locked graph object is mutably passed to semantic actions as it is being pushed onto the stack.
crs
Semantic Analysis
Semantic Analysis
Symbol Tables
Symbol tables are a tool used during compilation for performing semantic checks on your program. These involve verifications that can only be expressed in a context-sentive grammar, such as type inteference, type checking, and much more
This compiler generates symbol tables by performing semantic actions when traversing the AST previously generated.
The implementation of a symbol table was done with a graph data structure. Some assumptions about the system are listed below:
- Every node can have either a
Table
or aRecord
type. - A
Table
node can only haveRecord
nodes as children. - A
Record
node can only haveTable
nodes as children.
Imposing these restrictions on our graph results in some nice properties.
- Visiting all records in a table is done by visiting all of a
Table
node's immediate children (with a tree depth of 1). - Viewing all smallers scopes (nested symbol tables) can be done by ignoring all nodes that are leaves at 1 tree depth and visiting all tables that are at 2 tree depths.
- Visiting a parent scope (symbol table) is done at any point by traversing 2 nodes up the tree from any
Table
node. - If a
Table
node is a leaf node, this necessarily represents an empty scope.
AST tree traversal with the Visitor pattern
The Visitor is a useful design pattern during AST tree traversal, in order to group semantic actions into one component during a specific phase of semantic analysis.
A more funcitonal approach to the Visitor pattern was used in this program, but the core premise remains the same.
A "visitor" in the form of a function callback was passed to the DFS tree-traversal algorith. This callback was executed at every node. Once the callback was executed, it executed specific visit functions based on the node type.