ref: d2abc97bfa136ff2dcdf970457b1434cfd7cc712
parent: b7198a182264b7ef24dd1d03baca4c275361748b
author: Ori Bernstein <[email protected]>
date: Sat Aug 18 16:32:44 EDT 2012
Add start of compiler internals documentation.
--- /dev/null
+++ b/doc/compiler.txt
@@ -1,0 +1,131 @@
+ Structure of the Myrddin Compiler
+ Aug 2012
+ Ori Bernstein
+
+TABLE OF CONTENTS:
+
+ 1. OVERVIEW
+ 2. PARSING
+ 2.1. Lexing
+ 2.2. Parsing
+ 2.3. Type checking
+ 3. FLATTENING
+ 4. OPTIMIZATION
+ 5. CODE GENERATION
+ 6. TUTORIAL: ADDING A STATEMENT
+
+1. OVERVIEW:
+
+ The Myrddin compiler suite consists of a set of binaries, written in C,
+ which translate Myrddin source code to the assembly most appropriate for
+ the target platform, and subsequently invoke the native assembler on it.
+ The linker is not invoked by the compiler, and the final output is an
+ object file for the target platform.
+
+ The compilers are named with a single character for the target platform,
+ with a single character for the language being compiled. A table of the
+ compilers and their names is below:
+
+ Compiler Platform
+ -------------------------
+ 6m x86-64
+
+
+ The compilation is divided into a small number of phases. The first phase
+ is parsing. The first phase is parsing, where the source code is first
+ tokenized, parsed, and semantically checked. The second phase is the
+ machine dependent tree flattening. In this phase, the tree is decomposed
+ function by function into simple operations that are relatively close to
+ the machine. Sizes are fixed, and all loops, if statements, etc are
+ replaced with gotos. The next phase is a machine independent optimizer,
+ which currenty does nothing other than simply folding trees. In the final
+ phase, the instructions are selected and the registers are allocated.
+
+ So, to recap, the phases are as follows:
+
+ parse Tokenize, parse and analyze the source.
+ flatten Rewrite the complex nodes into simpe ones
+ opt Optimize the flattened source trees
+ gen Generate the assembly code
+
+
+2. PARSING:
+
+ This phase takes in a source file, and outputs a tree that is guaranteed
+ to be valid.
+
+ 2.1. Lexing:
+
+ Lexing occurs in parse/tok.c. Because we desire to use this lexer from
+ within yacc, the entry point to this code is in 'yylex()'. As required
+ by yacc, 'yylex()' returns an integer defining the token type, and
+ sets the 'tok' member of yylval to the token that was taken from the
+ input stream. In addition, to allow for better error messages, the
+ global variable 'curtok' is set to the value of 'yylval.tok'. This
+ allows yyerror to print the last token that was seen.
+
+ The tokens that are allowable are generated by Yacc from the '%token'
+ definiitions in parse/gram.y, and are placed into the file
+ 'parse/gram.h'. The lexer and parser code is the only code that
+ depends on these token constants.
+
+ The lexer is initalized through 'tokinit(char *file)'. This function
+ will open the file passed in, read all the data from it in one go
+ and set up the internal data for the tokenizer. The tokenizing is then
+ done while the whole file is in memory, which means that this code
+ will work poorly on files that are larger than the address space
+ available to the compiler. If this is a problem, you deserve all the
+ pain that is caused.
+
+ The file data is stored in the three global variables 'fidx', 'fbuf',
+ and 'fbufsz'. The actual tokenization happens through 'toknext()' and
+ its callees, which operate on these data structures character by
+ character, matching the values read, and shoving them into the 'Tok'
+ data structure.
+
+ 2.2. Parsing:
+
+ The parser used is a traditional Yacc based parser. It is generated
+ from the source in parse/gram.y. The starting production is 'file',
+ which fills in a global 'file' tree node. This 'file' tree node must
+ be initialized before yyparse() is called.
+
+
+ 2.3. Type Checking:
+
+ Type checking is done through unification of types. It's implemented
+ in parse/infer.c
+
+ 2.4. Speciaization:
+
+ 2.4. Serialization:
+
+ 2.5. Use Files:
+
+3. FLATTENING:
+
+ This phase is invoked repeatedly on each top level declaration that we
+ want to generate code for.
+
+4. OPTIMIZATION:
+
+ 4.1. Constant Folding:
+
+5. CODE GENERATION:
+
+ 5.1. Instruction Selection:
+ 5.2. Register Allocation:
+
+6: TUTORIAL: ADDING A STATEMENT:
+
+ 6.1. Stubbing in the node types:
+
+ 6.2. Parsing:
+
+ 6.3. Flattening:
+
+ 6.4. Optimization:
+
+ 6.5. Instruction Selection:
+
+
--- a/doc/lang.txt
+++ b/doc/lang.txt
@@ -4,7 +4,7 @@
TABLE OF CONTENTS:
- 1. OVERVIEW
+ 1. ABOUT
2. LEXICAL CONVENTIONS
3. SYNTAX
3.1. Declarations
@@ -20,7 +20,7 @@
8. GRAMMAR
9. FUTURE DIRECTIONS
-1. OVERVIEW:
+1. ABOUT:
Myrddin is designed to be a simple, low level programming
language. It is designed to provide the programmer with