Back Next

Parser Theory Revision

Tokenization: The process of partitioning the input text into tokens (the units of work for a grammar) and discarding white space and comments.
Grammar Productions:
expression
    : IDENTIFIER
    | expression PLUS expression
    | expression ASSIGN expression
    ;
A grammar is built up from many productions which describe the structure if the language. Parsing is the process of grouping tokens to match the specified productions of the grammar. These productions are usually similar to “BNF” notation. In yacc you can attach { C code in braces } to be executed when a production matches.
Abstract Syntax Tree: It is possible for a parser to build a series of dynamically allocated linked data structures which represent the input which has been parsed. Where these data structures correspond closely to the grammar, they are called abstract syntax trees.