shithub: mc

Download patch

ref: d2abc97bfa136ff2dcdf970457b1434cfd7cc712
parent: b7198a182264b7ef24dd1d03baca4c275361748b
author: Ori Bernstein <[email protected]>
date: Sat Aug 18 16:32:44 EDT 2012

Add start of compiler internals documentation.

--- /dev/null
+++ b/doc/compiler.txt
@@ -1,0 +1,131 @@
+                Structure of the Myrddin Compiler
+                            Aug 2012
+                          Ori Bernstein
+
+TABLE OF CONTENTS:
+
+    1. OVERVIEW
+    2. PARSING
+        2.1. Lexing
+        2.2. Parsing
+        2.3. Type checking
+    3. FLATTENING
+    4. OPTIMIZATION
+    5. CODE GENERATION
+    6. TUTORIAL: ADDING A STATEMENT
+
+1. OVERVIEW:
+
+    The Myrddin compiler suite consists of a set of binaries, written in C,
+    which translate Myrddin source code to the assembly most appropriate for
+    the target platform, and subsequently invoke the native assembler on it.
+    The linker is not invoked by the compiler, and the final output is an
+    object file for the target platform.
+
+    The compilers are named with a single character for the target platform,
+    with a single character for the language being compiled. A table of the
+    compilers and their names is below:
+
+        Compiler        Platform
+        -------------------------
+        6m              x86-64
+
+
+    The compilation is divided into a small number of phases. The first phase
+    is parsing. The first phase is parsing, where the source code is first
+    tokenized, parsed, and semantically checked. The second phase is the
+    machine dependent tree flattening. In this phase, the tree is decomposed
+    function by function into simple operations that are relatively close to
+    the machine. Sizes are fixed, and all loops, if statements, etc are
+    replaced with gotos. The next phase is a machine independent optimizer,
+    which currenty does nothing other than simply folding trees. In the final
+    phase, the instructions are selected and the registers are allocated.
+
+    So, to recap, the phases are as follows:
+
+        parse           Tokenize, parse and analyze the source.
+        flatten         Rewrite the complex nodes into simpe ones
+        opt             Optimize the flattened source trees
+        gen             Generate the assembly code
+
+
+2. PARSING:
+
+    This phase takes in a source file, and outputs a tree that is guaranteed
+    to be valid.
+
+    2.1. Lexing:
+        
+        Lexing occurs in parse/tok.c. Because we desire to use this lexer from
+        within yacc, the entry point to this code is in 'yylex()'. As required
+        by yacc, 'yylex()' returns an integer defining the token type, and
+        sets the 'tok' member of yylval to the token that was taken from the
+        input stream. In addition, to allow for better error messages, the
+        global variable 'curtok' is set to the value of 'yylval.tok'. This
+        allows yyerror to print the last token that was seen.
+
+        The tokens that are allowable are generated by Yacc from the '%token'
+        definiitions in parse/gram.y, and are placed into the file
+        'parse/gram.h'. The lexer and parser code is the only code that
+        depends on these token constants.
+
+        The lexer is initalized through 'tokinit(char *file)'. This function
+        will open the file passed in, read all the data from it in one go
+        and set up the internal data for the tokenizer. The tokenizing is then
+        done while the whole file is in memory, which means that this code
+        will work poorly on files that are larger than the address space
+        available to the compiler. If this is a problem, you deserve all the
+        pain that is caused.
+
+        The file data is stored in the three global variables 'fidx', 'fbuf',
+        and 'fbufsz'. The actual tokenization happens through 'toknext()' and
+        its callees, which operate on these data structures character by
+        character, matching the values read, and shoving them into the 'Tok'
+        data structure.
+
+    2.2. Parsing:
+
+        The parser used is a traditional Yacc based parser. It is generated
+        from the source in parse/gram.y. The starting production is 'file',
+        which fills in a global 'file' tree node. This 'file' tree node must
+        be initialized before yyparse() is called.
+
+
+    2.3. Type Checking:
+
+        Type checking is done through unification of types. It's implemented
+        in parse/infer.c 
+
+    2.4. Speciaization:
+
+    2.4. Serialization:
+
+    2.5. Use Files:
+
+3. FLATTENING:
+
+    This phase is invoked repeatedly on each top level declaration that we
+    want to generate code for. 
+
+4. OPTIMIZATION:
+
+    4.1. Constant Folding:
+
+5. CODE GENERATION:
+
+    5.1. Instruction Selection:
+    5.2. Register Allocation:
+
+6: TUTORIAL: ADDING A STATEMENT:
+
+    6.1. Stubbing in the node types:
+
+    6.2. Parsing:
+    
+    6.3. Flattening:
+
+    6.4. Optimization:
+
+    6.5. Instruction Selection:
+
+
--- a/doc/lang.txt
+++ b/doc/lang.txt
@@ -4,7 +4,7 @@
 
 TABLE OF CONTENTS:
 
-    1. OVERVIEW
+    1. ABOUT
     2. LEXICAL CONVENTIONS
     3. SYNTAX
         3.1. Declarations
@@ -20,7 +20,7 @@
     8. GRAMMAR
     9. FUTURE DIRECTIONS
 
-1. OVERVIEW:
+1. ABOUT:
 
         Myrddin is designed to be a simple, low level programming
         language.  It is designed to provide the programmer with