ref: 6710eaf7cfe2e15b6c4bc352c4285678efa53212
parent: e3c1f19a29528979f3ab31a5b1cb3de3cb4483cd
parent: 1c351dde4625065e651108781c1de2683914fd0f
author: Ori Bernstein <[email protected]>
date: Sun Aug 26 08:34:47 EDT 2012
Merge git+ssh://mimir.eigenstate.org/git/qz/mc
--- a/doc/compiler.txt
+++ b/doc/compiler.txt
@@ -48,23 +48,23 @@
The compilation is divided into a small number of phases. The first phase
is parsing, where the source code is first tokenized, the abstract syntax
tree (AST) is generated, and semantically checked. The second phase is the
- machine dependent tree flattening. In this phase, the tree is decomposed
+ machine-dependent tree flattening. In this phase, the tree is decomposed
function by function into simple operations that are relatively close to
- the machine. Sizes are fixed, and all loops, if statements, etc are
- replaced with gotos. The next phase is a machine independent optimizer,
+ the machine. Sizes are fixed, and all loops, if statements, etc. are
+ replaced with gotos. The next phase is a machine-independent optimizer,
which currenty does nothing other than simply folding trees. In the final
phase, the instructions are selected and the registers are allocated.
So, to recap, the phases are as follows:
- parse Tokenize, parse and analyze the source.
+ parse Tokenize, parse, and analyze the source
flatten Rewrite the complex nodes into simpe ones
opt Optimize the flattened source trees
gen Generate the assembly code
- 1.1. Tree Structure.
+ 1.1. Tree Structure:
- File nodes (n->type == Nfile) represents the being compiled. The current
+ File nodes (n->type == Nfile) represent the files being compiled. The current
node is held in a global variable called, unsurprisingly, 'file'. The
global symbol table, export table, uses, and other compilation-specific
information is stored in this node. This implies that the compiler can
@@ -113,7 +113,7 @@
2.1. Lexing:
- Lexing occurs in parse/tok.c. Because we desire to use this lexer from
+ Lexing occurs in parse/tok.c. Because we want to use this lexer from
within yacc, the entry point to this code is in 'yylex()'. As required
by yacc, 'yylex()' returns an integer defining the token type, and
sets the 'tok' member of yylval to the token that was taken from the
@@ -122,7 +122,7 @@
allows yyerror to print the last token that was seen.
The tokens that are allowable are generated by Yacc from the '%token'
- definiitions in parse/gram.y, and are placed into the file
+ definitions in parse/gram.y, and are placed into the file
'parse/gram.h'. The lexer and parser code is the only code that
depends on these token constants.
@@ -142,7 +142,7 @@
2.2. AST Creation:
- The parser used is a traditional Yacc based parser. It is generated
+ The parser used is a traditional Yacc-based parser. It is generated
from the source in parse/gram.y. The starting production is 'file',
which fills in a global 'file' tree node. This 'file' tree node must
be initialized before yyparse() is called.
@@ -167,7 +167,7 @@
complete as possible, and making sure that the types of globals
actually match up with the exported types.
- The next step is the actual type inference. We do a bottom up walk of
+ The next step is the actual type inference. We do a bottom-up walk of
the tree, unifying types as we go. There are subtleties with the
member operator, however. Because the '.' operator is used for both
member lookups and namespace lookups, before we descend into a node
@@ -203,7 +203,7 @@
So, in the 'typesub()' function, we iterate over the entire tree,
replacing every instance of a non-concrete type with the final
mapped type. If a type does not map to a fully concrete type,
- this is where we error.
+ this is where we flag an error.
FIXME: DESCRIBE HOW YOU FIXED GENERICS ONCE YOU FIX GENERICS.
@@ -232,15 +232,15 @@
Usefiles are more or less files that consist of a single character tag
that tells us what type of tree to deserialize. Because serialized
- trees are compiler version dependant, so are usefiles.
+ trees are compiler version dependent, so are usefiles.
3. FLATTENING:
- This phase is invoked repeatedly on each top level declaration that we
+ This phase is invoked repeatedly on each top-level declaration that we
want to generate code for. There is a good chance that this flattening
- phase should be made machine independent, and passed as a parameter
+ phase should be made machine-independent, and passed as a parameter
a machine description describing known integer and pointer sizes, among
- other machine attributes. However, for now, it is machine dependent,
+ other machine attributes. However, for now, it is machine-dependent,
and lives in 6/simp.c.
The goal of flattening a tree is to take semantically involved constructs
@@ -277,7 +277,7 @@
3.2. Complex Expressions:
Complex expressions such as copying types larger than a single machine
- word, pulling members out of structures, emulated multiplication and
+ word, pulling members out of structures, emulating multiplication and
division for larger integers sizes, and similar operations are reduced
to trees that are expressible in terms of simple machine operations.
@@ -298,7 +298,7 @@
4.1. Constant Folding:
Expressions with constant values are simplified algebraically. For
- example, the expression 'x*1' is simplified to simply 'x', '0/n' is
+ example, the expression 'x*1' is simplified to 'x', '0/n' is
simplified to '0', and so on.
@@ -306,18 +306,18 @@
5.1. Instruction Selection:
- Instruction selection is done via a simple hand written bottom up pass
+ Instruction selection is done via a simple handwritten bottom-up pass
over the tree. Common patterns such as scaled or offset indexing are
- recognized by the patterns, but no attempts at finding an optimal
- tiling are made.
+ recognized by the patterns, but no attempts are made at finding an
+ optimal tiling.
5.2. Register Allocation:
Register allocation is done via the algorithm described in "Iterated
- Regster Coalescing", by Appel and George. As of the time of this
+ Regster Coalescing" by Appel and George. As of the time of this
writing, the register allocator does not yet implement overlapping
register classes. This will be done as described in "A generalized
- algorithm for graph-coloring register allocation", by Smith, Ramsey,
+ algorithm for graph-coloring register allocation" by Smith, Ramsey,
and Holloway.
6: TUTORIAL: ADDING A STATEMENT:
--- a/doc/lang.txt
+++ b/doc/lang.txt
@@ -22,14 +22,14 @@
1. ABOUT:
- Myrddin is designed to be a simple, low level programming
+ Myrddin is designed to be a simple, low-level programming
language. It is designed to provide the programmer with
predictable behavior and a transparent compilation model,
while at the same time providing the benefits of strong
type checking, generics, type inference, and similar.
Myrddin is not a language designed to explore the forefront
- of type theory, or compiler technology. It is not a language
- that is focused on guaranteeing perfect safety. It's focus
+ of type theory or compiler technology. It is not a language
+ that is focused on guaranteeing perfect safety. Its focus
is on being a practical, small, fairly well defined, and
easy to understand language for work that needs to be close
to the hardware.
@@ -41,10 +41,10 @@
2. LEXICAL CONVENTIONS:
- The language is composed of several classes of token. There
+ The language is composed of several classes of tokens. There
are comments, identifiers, keywords, punctuation, and whitespace.
- Comments, begin with "/*" and end with "*/". They may nest.
+ Comments begin with "/*" and end with "*/". They may nest.
/* this is a comment /* with another inside */ */
@@ -80,7 +80,7 @@
the program. There are several literals implemented within the language.
These are fully described in section 3.2 of this manual.
- In the compiler, single semicolons (';') , semicolons and newlines (\x10)
+ In the compiler, single semicolons (';') and newline (\x10)
characters are treated identically, and are therefore interchangable.
They will both be referred to "endline"s thoughout this manual.
@@ -87,18 +87,14 @@
3. SYNTAX OVERVIEW:
- Myrddin syntax will likely have a familiar-but-strange taste
- to many people. Many of the concepts and constructions will be
- similar to those present in C, but different.
-
3.1. Declarations:
- A declaration consists of a declaration class (ie, one
+ A declaration consists of a declaration class (i.e., one
of 'const', 'var', or 'generic'), followed by a declaration
name, optionally followed by a type and assignment. One thing
you may note is that unlike most other languages, there is no
special function declaration syntax. Instead, a function is
- declared like any other value: By assigning its name to a
+ declared like any other value: by assigning its name to a
constant or variable.
const: Declares a constant value, which may not be
@@ -105,7 +101,7 @@
modified at run time. Constants must have
initializers defined.
var: Declares a variable value. This value may be
- assigned to, copied from, and
+ assigned to, copied from, and modified.
generic: Declares a specializable value. This value
has the same restricitions as a const, but
taking its address is not defined. The type
@@ -132,13 +128,13 @@
var y
- Declares a generic with type '@a', and assigns it the value
+ Declare a generic with type '@a', and assigns it the value
'blah'. Every place that 'z' is used, it will be specialized,
and the type parameter '@a' will be substituted.
generic z : @a = blah
- Declares a function f with and without type inference. Both
+ Declare a function f with and without type inference. Both
forms are equivalent. 'f' takes two parameters, both of type
int, and returns their sum as an int
@@ -164,9 +160,9 @@
eg: 0x123_fff, 0b1111, 1234
- Float literals are also a sequence of digits beginning with a
- digit and possibly separated by underscores. They are also of a
- generic type, and may be used whenever a floating point type is
+ Floating-point literals are also a sequence of digits beginning with
+ a digit and possibly separated by underscores. They are also of a
+ generic type, and may be used whenever a floating-point type is
expected. Floating point literals are always in decimal, and
as of this writing, exponential notation is not supported[2]
@@ -396,7 +392,7 @@
`Name x Union construction
Precedence 5:
- x casttto(type) Cast expression
+ x castto(type) Cast expression
Precedence 4:
x == x Equality
@@ -425,10 +421,10 @@
x <<= x Fused shl/assign Right assoc
x >>= x Fused shr/assign Right assoc
- Precedence 14:
+ Precedence 0:
-> x Return expression
- All expressions on integers act on complement-two values which wrap
+ All expressions on integers act on two's complement values which wrap
on overflow. Right shift expressions fill with the sign bit on
signed types, and fill with zeros on unsigned types.
--- a/doc/mc.1
+++ b/doc/mc.1
@@ -32,12 +32,12 @@
.TP
.B -d [flTri]
-Prints debugging dumps. Additional options may be given to give more
+Print debugging dumps. Additional options may be given to give more
debugging information for specific intermediate states of the compilation.
.TP
.B -h
-Prints a summary of the available options.
+Print a summary of the available options.
.TP
.B -I path
@@ -48,11 +48,11 @@
.TP
.B -o output-file
-Specifies that the generated code should be placed in
+Specify that the generated code should be placed in
.TP
.B -S
-Generate assembly instead of an object file.
+Generate assembly code instead of an object file.
.SH EXAMPLE
.EX
@@ -62,7 +62,7 @@
.EE
.SH FILES
-The source for this compiler is available from
+The source code for this compiler is available from
.B git://git.eigenstate.org/git/ori/mc2.git
.SH SEE ALSO
@@ -78,7 +78,7 @@
There are virtually no optimizations done, and the generated source is
often very poorly performing.
.PP
-The current calling convention is stack, and not register, based, even
-on architectures where it should be register based.
+The current calling convention is stack-based and not register-based, even
+on architectures where it should be register-based.
.PP
-The calling convention is not C compatible.
+The calling convention is not compatible with C.
--- a/doc/muse.1
+++ b/doc/muse.1
@@ -17,9 +17,9 @@
including all of the exported symbols. If an output file name is not given,
and we are not merging usefiles, then an input file named
.I filename.myr
-will have a usefile named
+will generate a usefile named
.I filename.use
-generated.
+\&.
If the filename does not end with the suffix
.I .myr
@@ -28,10 +28,9 @@
will simply be appended to it.
.PP
-The muse program is architecture independent, and a usefile generated
-on one architecture will work with another. However, the format of the
+The output of muse is architecture-independent. However, the format of the
generated file is not stable, and is not guaranteed to work across
-compiler versions.
+different compiler versions.
.PP
The muse options are:
@@ -38,12 +37,12 @@
.TP
.B -d [flTri]
-Prints debugging dumps. Additional options may be given to give more
+Print debugging dumps. Additional options may be given to give more
debugging information for specific intermediate states of the compilation.
.TP
.B -h
-Prints a summary of the available options.
+Print a summary of the available options.
.TP
.B -I path
@@ -54,7 +53,7 @@
.TP
.B -o output-file
-Specifies that the generated usefile should be named
+Specify that the generated usefile should be named
.I output-file
.TP