shithub: mc

--- /dev/null

+++ b/doc/compiler.txt

@@ -1,0 +1,131 @@

+                Structure of the Myrddin Compiler

+                            Aug 2012

+                          Ori Bernstein

+TABLE OF CONTENTS:

+    1. OVERVIEW

+    2. PARSING

+        2.1. Lexing

+        2.2. Parsing

+        2.3. Type checking

+    3. FLATTENING

+    4. OPTIMIZATION

+    5. CODE GENERATION

+    6. TUTORIAL: ADDING A STATEMENT

+1. OVERVIEW:

+    The Myrddin compiler suite consists of a set of binaries, written in C,

+    which translate Myrddin source code to the assembly most appropriate for

+    the target platform, and subsequently invoke the native assembler on it.

+    The linker is not invoked by the compiler, and the final output is an

+    object file for the target platform.

+    The compilers are named with a single character for the target platform,

+    with a single character for the language being compiled. A table of the

+    compilers and their names is below:

+        Compiler        Platform

+        -------------------------

+        6m              x86-64

+    The compilation is divided into a small number of phases. The first phase

+    is parsing. The first phase is parsing, where the source code is first

+    tokenized, parsed, and semantically checked. The second phase is the

+    machine dependent tree flattening. In this phase, the tree is decomposed

+    function by function into simple operations that are relatively close to

+    the machine. Sizes are fixed, and all loops, if statements, etc are

+    replaced with gotos. The next phase is a machine independent optimizer,

+    which currenty does nothing other than simply folding trees. In the final

+    phase, the instructions are selected and the registers are allocated.

+    So, to recap, the phases are as follows:

+        parse           Tokenize, parse and analyze the source.

+        flatten         Rewrite the complex nodes into simpe ones

+        opt             Optimize the flattened source trees

+        gen             Generate the assembly code

+2. PARSING:

+    This phase takes in a source file, and outputs a tree that is guaranteed

+    to be valid.

+    2.1. Lexing:

+        Lexing occurs in parse/tok.c. Because we desire to use this lexer from

+        within yacc, the entry point to this code is in 'yylex()'. As required

+        by yacc, 'yylex()' returns an integer defining the token type, and

+        sets the 'tok' member of yylval to the token that was taken from the

+        input stream. In addition, to allow for better error messages, the

+        global variable 'curtok' is set to the value of 'yylval.tok'. This

+        allows yyerror to print the last token that was seen.

+        The tokens that are allowable are generated by Yacc from the '%token'

+        definiitions in parse/gram.y, and are placed into the file

+        'parse/gram.h'. The lexer and parser code is the only code that

+        depends on these token constants.

+        The lexer is initalized through 'tokinit(char *file)'. This function

+        will open the file passed in, read all the data from it in one go

+        and set up the internal data for the tokenizer. The tokenizing is then

+        done while the whole file is in memory, which means that this code

+        will work poorly on files that are larger than the address space

+        available to the compiler. If this is a problem, you deserve all the

+        pain that is caused.

+        The file data is stored in the three global variables 'fidx', 'fbuf',

+        and 'fbufsz'. The actual tokenization happens through 'toknext()' and

+        its callees, which operate on these data structures character by

+        character, matching the values read, and shoving them into the 'Tok'

+        data structure.

+    2.2. Parsing:

+        The parser used is a traditional Yacc based parser. It is generated

+        from the source in parse/gram.y. The starting production is 'file',

+        which fills in a global 'file' tree node. This 'file' tree node must

+        be initialized before yyparse() is called.

+    2.3. Type Checking:

+        Type checking is done through unification of types. It's implemented

+        in parse/infer.c

+    2.4. Speciaization:

+    2.4. Serialization:

+    2.5. Use Files:

+3. FLATTENING:

+    This phase is invoked repeatedly on each top level declaration that we

+    want to generate code for.

+4. OPTIMIZATION:

+    4.1. Constant Folding:

+5. CODE GENERATION:

+    5.1. Instruction Selection:

+    5.2. Register Allocation:

+6: TUTORIAL: ADDING A STATEMENT:

+    6.1. Stubbing in the node types:

+    6.2. Parsing:

+    6.3. Flattening:

+    6.4. Optimization:

+    6.5. Instruction Selection:

--- a/doc/lang.txt

+++ b/doc/lang.txt

@@ -4,7 +4,7 @@

 TABLE OF CONTENTS:

-    1. OVERVIEW

+    1. ABOUT

     2. LEXICAL CONVENTIONS

     3. SYNTAX

         3.1. Declarations

@@ -20,7 +20,7 @@

     8. GRAMMAR

     9. FUTURE DIRECTIONS

-1. OVERVIEW:

+1. ABOUT:

         Myrddin is designed to be a simple, low level programming

         language.  It is designed to provide the programmer with