Skip Navigation Links
Overview
Development environment and the process

      In general you have to make the following steps:

Create a new grammar file
Compile it and remove all design type errors
Open or create some number of test files and insure that your grammar works
Generate the parser source code utilizing your grammar


     After you started UltraGram and clicked the menu item New Project the following dialog will be presented



Here you can enter project settings. If the check box is set then system will generate a default grammar file and a corresponding test file. This is done just to highlight the general guidelines – modify this grammar in the way you want. Here is the default grammar file



     After you are done with the grammar press the Compile button on the toolbar. This will check grammar file for errors and build all required data tables. All errors if any, will be listed in the parser Task List window. Open a test file(s) and run your grammar against it. Use the debug toolbar to control the execution.



You can set / delete breakpoints in the test file simply by clicking on the gray stripe on the left side of the window.



The current parsing state is indicated in the status bar at the bottom of the screen. If parsing completed successfully you will see the status 'Accept'

After you are done with debugging session press the Generate button , select the required programming language and generate the parser source code.

Power features

Debugging

     UltraGram provides extensive functionality to simplify the process of debugging and to eliminate existing or possible parsing problems. First of all you can set the necessary resolution for the debugging process



     The second important feature is related to the shift-reduce and reduce-reduce conflicts that sometimes reside in grammars. If the grammar is more or less complicated then quite often it is very difficult to understand how conflicts of this type are created. UltraGram provides functionality to render shift-reduce and reduce-reduce conflicts This unique feature gives you graphical representation of all transitions that occur from the starting point of conflict in some DFA node up to the final DFA node where conflict is actually detected. The next feature is ability to generate the entire DFA table for the current grammar. This can be useful for detailed understanding of what is going on under the hood of the LALR(1) parsing process. The last important feature is ability to handle different kinds of errors during the parsing process.

Error handling and recovery

     UltraGram can handle two types of runtime errors : errors that are created by missing tokens and errors created by conflicting tokens. In the first case error happens when parser fails to get expected token(s) from the input ( due to invalid input file for example ). Here two cases are possible – parser stops completely or parser enters recovery mode. To initiate the case with recovery mode some changes should be made to the grammar file. First of all you need to make some predictions about places where an error could occur. After that you need to extend each corresponding rule with production that contains a keyword %error. Here is the modified production section of the above sample


In this sample expression ( exp ) may come in two forms : in a way it was initially defined for correct parser input and in a way when input contains invalid tokens or invalid sequence of tokens followed by semicolon. In this case if the test file will contain for example the following invalid string Abc = 3 + * ; parser will recognize the error form of the rule, reduce it and continue with execution. Note, that this technique may be used also in cases when there are no problems with the input file however you deliberately want to skip some portion of it due to whatever reason.

Conflicting tokens handling

     The second case of error handling is ability to dial with conflicting tokens. Token conflict happens when it is possible to build more then one token of the same size at the given position of the parser input. If in this case a wrong token will be passed for processing parser will fail. This is a source of serious problem when a parser that was working and tested suddenly breaks on a new file. The worst thing that it can happen after you delivered your software to the customer. Fortunately UltraGram has a unique ability not only to detect errors of this kind, but eliminate them by using two different and quite efficient mechanisms. These mechanisms are turned on by special options dkey(on|off) and rtc(on|off,<level>) in the %pragmas section of the input file.



Generating parser source code


     After grammar is successfully compiled and tested it is possible to generate a source code for selected programming language. At the moment the following languages are supported:

C++,       Java,      C#,       VB.NET


Grammar file format at a glance

The UltraGram grammar file has three main sections, shown below:

pragmas
tokens
precedence
production

        In the pragmas section different directives are listed that affect the behavior of the parser related generally to the error handling and conflict resolution ( in more details described below )
        In the tokens section a "terminal symbols" are defined (also known as a "token type") . Each symbol represents a class of syntactically equivalent tokens. You use the symbol in grammar rules to mean that a token in that class is allowed. Each symbol has the following format:

        'expression' [ identifier, [ 'alias' ]] [ %ignore ] ;

Here:
   expression defined the rule by which the token in the input text will be recognized
   identifier is a unique name of the symbol
   alias is an alias of this symbol.
   %ignore directive instructs to skip the token

Examples: 

'[ \t\r\n\f\b]+' %ignore;  
'[a-z_A-Z0-9][a-z_A-Z0-9]*' Id, 'Identifier';
'[0-9]+' Integer, 'Int';
'([0-9]+\.[0-9]*)|(\.[0-9]+)' Real, 'Real';
'\"[^\"]*\"' String, 'Str';
'(\-\-)[^\n]*' SLComment, %ignore;


You may refer to each symbol or name or by alias enclosed in quotes.
    In optional precedence section declarations for operator precedence allow you to specify when to shift and when to reduce. Use the '%left', '%right' or '%nonassoc' declaration to declare a token and specify its precedence and associativity, all at once. The syntax of a precedence declaration is as follows:

              %left symbols... ;
               or
              %right symbols... ;

   The associativity of an operator OP determines how repeated uses of the operator nest: whether `X OP Y OP Z' is parsed by grouping X with Y first or by grouping Y with Z first. '%left' specifies left-associativity (grouping X with Y first) and '%right' specifies right-associativity (grouping Y with Z first). '%nonassoc' specifies no associativity, which means that 'X OP Y OP Z' is considered a syntax error.
   The precedence of an operator determines how it nests with other operators. All the tokens declared in a single precedence declaration have equal precedence and nest together according to their associativity. When two tokens declared in different precedence declarations associate, the one declared later has the higher precedence and is grouped first.
Example:

%left '*' '/' ;
%right '**' ;
%left UMINUS ;

   In the production section grammar rules are defined. Each rule has the following format:

          name : rule_components ;

where name is the nonterminal symbol that this rule describes and each rule_component is an optional list of terminal or nonterminal symbol related to this rule and separated by a whitespace.
Example :

          exp : exp '-' exp ;

This rule defines that two groupings of type 'exp', with a '-' token in between, can be combined into a larger grouping of type 'exp'. Multiple rules for the same name can be written separately or can be joined with the vertical-bar character | as follows:

      name: rule_components1 | rule_components2... ;

Example :

      exp : exp '-' exp | exp '+' exp ;

If rule components in a rule is empty, it means that name can match the empty string. Example : here is how to define a comma-separated sequence of zero or more `exp' groupings:

       exp_opt : // empty rule 
             | expseq ; 
   expseq  : exp 
             |  expseq  ','  exp ;


      The entry point ( main rule ) is defined by a name that follows the production keyword. There must always be a rule with the name that corresponds to it.