|
Development environment and the process
In general you have to make the following steps:
Create a new grammar file
Compile it and remove all design type errors
Open or create some number of test files and insure that your grammar works
Generate the parser source code utilizing your grammar
|
After you started UltraGram and clicked the menu item New Project the following dialog will be presented
Here you can enter project settings. If the check box is set then system will generate a default
grammar file and a corresponding test file. This is done just to highlight the general
guidelines – modify this grammar in the way you want. Here is the default grammar file
After you are done with the grammar press the Compile button
on the toolbar. This will check grammar file for errors and build all required data tables. All errors
if any, will be listed in the parser Task List window.
Open a test file(s) and run your grammar against it. Use the debug toolbar to control the execution.
You can set / delete breakpoints in the test file simply by clicking on the gray stripe on
the left side of the window.
The current parsing state is indicated in the status bar at the bottom of the screen.
If parsing completed successfully you will see the status 'Accept'
After you are done with debugging session press the Generate button
, select the required programming language and generate the parser source code.
Power features
Debugging
UltraGram provides extensive functionality to simplify the process of debugging and to eliminate existing or possible parsing problems.
First of all you can set the necessary resolution for the debugging process
The second important feature is related to the shift-reduce and
reduce-reduce conflicts that sometimes reside in grammars. If the grammar is more or less complicated then quite often it is very difficult to understand how conflicts of this type are created. UltraGram provides functionality to render
shift-reduce and reduce-reduce conflicts
This unique feature gives you graphical representation of all transitions that occur from the starting point of conflict in some DFA node up to the final DFA node where conflict is actually detected.
The next feature is ability to generate the entire DFA table for the current grammar. This can be useful for detailed understanding of what is going on under the hood of the LALR(1) parsing process.
The last important feature is ability to handle different kinds of errors during the parsing process.
Error handling and recovery
UltraGram can handle two types of runtime errors : errors that are created by missing tokens and errors created by conflicting tokens.
In the first case error happens when parser fails to get expected token(s) from the
input ( due to invalid input file for example ). Here two cases are possible – parser
stops completely or parser enters recovery mode. To initiate the case with recovery
mode some changes should be made to the grammar file. First of all you need to make
some predictions about places where an error could occur. After that you need to extend
each corresponding rule with production that contains a keyword %error.
Here is the modified production section of the above sample
In this sample expression ( exp ) may come in two forms : in
a way it was initially defined for correct parser input and in a way
when input contains invalid tokens or invalid sequence of tokens followed by
semicolon. In this case if the test file will contain for example the following
invalid string Abc = 3 + * ; parser will recognize the
error form of the rule, reduce it and continue with execution. Note, that
this technique may be used also in cases when there are no problems with the
input file however you deliberately want to skip some portion of it due to whatever reason.
Conflicting tokens handling
The second case of error handling is ability to dial with conflicting tokens.
Token conflict happens when it is possible to build more then one token of the same size at the given
position of the parser input. If in this case a wrong token will be passed for
processing parser will fail. This is a source of serious problem when a parser that was
working and tested suddenly breaks on a new file. The worst thing that it can happen after you
delivered your software to the customer. Fortunately UltraGram has a unique ability not only
to detect errors of this kind, but eliminate them by using two different and quite efficient
mechanisms. These mechanisms are turned on by special options
dkey(on|off) and rtc(on|off,<level>)
in the %pragmas section of the input file.
Generating parser source code
After grammar is successfully compiled and tested it is possible
to generate a source code for selected programming language. At the moment the following
languages are supported:
|
Grammar file format at a glance
The UltraGram grammar file has three main sections, shown below:
|
pragmas
tokens
precedence
production
|
       
In the pragmas section different directives are
listed that affect the behavior of the parser related generally to the error handling and
conflict resolution ( in more details described below )
       
In the tokens section a "terminal symbols" are defined (also known as
a "token type") . Each symbol represents a class of syntactically equivalent tokens. You use
the symbol in grammar rules to mean that a token in that class is allowed. Each symbol has
the following format:
       
'expression' [ identifier, [ 'alias' ]] [ %ignore ] ;
Here:
expression defined the rule by which the token in the input text will
be recognized
identifier is a unique name of the symbol
alias is an alias of this symbol.
%ignore directive instructs to skip the token
Examples:
|
'[ \t\r\n\f\b]+'
|
%ignore;
|
 
|
|
'[a-z_A-Z0-9][a-z_A-Z0-9]*'
|
Id,
|
'Identifier';
|
|
'[0-9]+'
|
Integer,
|
'Int';
|
|
'([0-9]+\.[0-9]*)|(\.[0-9]+)'
|
Real,
|
'Real';
|
|
'\"[^\"]*\"'
|
String,
|
'Str';
|
|
'(\-\-)[^\n]*'
|
SLComment,
|
%ignore;
|
You may refer to each symbol or name or by alias enclosed in quotes.
In optional precedence section declarations for operator precedence
allow you to specify when to shift and when to reduce. Use the '%left',
'%right' or '%nonassoc' declaration to declare a token and
specify its precedence and associativity, all at once. The syntax of a precedence declaration
is as follows:
%left symbols... ;
or
%right symbols... ;
The associativity of an operator OP determines how repeated uses
of the operator nest: whether `X OP Y OP Z' is parsed by grouping X
with Y first or by grouping Y with Z first.
'%left' specifies left-associativity (grouping X with
Y first) and '%right' specifies right-associativity
(grouping Y with Z first). '%nonassoc' specifies
no associativity, which means that 'X OP Y OP Z' is considered a syntax error.
The precedence of an operator determines how it nests with other operators. All
the tokens declared in a single precedence declaration have equal precedence and nest together
according to their associativity. When two tokens declared in different precedence declarations
associate, the one declared later has the higher precedence and is grouped first.
Example:
|
%left
|
'*' '/' ;
|
|
%right
|
'**' ;
|
|
%left
|
UMINUS ;
|
In the production section grammar rules are defined.
Each rule has the following format:
name : rule_components ;
where name is the nonterminal symbol that this rule describes and each
rule_component is an optional list of terminal or nonterminal symbol related
to this rule and separated by a whitespace.
Example :
exp : exp '-' exp ;
This rule defines that two groupings of type 'exp', with a '-'
token in between, can be combined into a larger grouping of type 'exp'.
Multiple rules for the same name can be written separately or can be joined with the
vertical-bar character | as follows:
name: rule_components1 | rule_components2...
;
Example :
exp : exp '-' exp | exp '+' exp ;
If rule components in a rule is empty, it means that name can match the empty string. Example :
here is how to define a comma-separated sequence of zero or more `exp' groupings:
exp_opt : // empty rule
| expseq ;
expseq : exp
| expseq ',' exp
;
The entry point ( main rule ) is defined by a name that follows the production keyword.
There must always be a rule with the name that corresponds to it.
|
|
|
|