Difference between pages "Code Generation Tutorial" and "Constrained Dynamic Parser"

From Event-B
(Difference between pages)
Jump to navigationJump to search
imported>Andy
 
imported>Mathieu
 
Line 1: Line 1:
'''This Page is Under Construction!!!!'''
+
{{TOCright}}
 +
This page describes the requirements for a '''generic parser''' for the [[Event-B Mathematical Language]].
  
=== Tutorial Overview ===
+
A first design proposal is also drafted.
  
The aim of the tutorial is to allow users to explore the approach with a relatively simple example. The example uses a shared buffer with reader and writer processes. The tutorial is presented in three stages, making use of the example projects from the download site. There are two translations performed, one is to a common language model (IL1). The second is to an Event-B project which includes a model of the implementation. There is a PrettyPrinter for Ada source code, which uses the common language model. An overview of Tasking Event-B can be found at http://wiki.event-b.org/index.php/Tasking_Event-B_Overview.
+
== Requirements ==
 +
In order to be usable [[mathematical extensions]] require that the event-b mathematical language syntax can be extended by the final user.  
  
A typical Event-B development may be refined to the point where it is ready for implementation, but the Event-B language is not expressive enough to fully describe the implementation. Tasking Event-B facilitates this final step to implementation, by extending Event-B with the necessary constructs. Event-B machines that are to be implemented (and their seen Contexts) are selected and added to a ''Tasking Development''; the Tasking Development files have the file extension ''.tasking''. The machines in the Tasking Development are then extended with implementation details.
+
The lexical analyser and the syntaxic parser thus have to be extensible in a simple enough way (from a user point of vue).
  
The example/tutorial projects are,
+
=== Requirements Exported by the Current Language Design ===
 +
==== Operator Priority ====
 +
* operator are defined by group,
 +
* group of operator have a defined precedences,
 +
* there may be precedences defined inside groups.
  
{| border="1"
+
==== Operator Compatibility ====
|SharedBuffer20100819Demo
+
* a compatibility table defines allowed associativities inside a group,
|An example project with a completed Tasking Development and IL1 model (post IL1 translation, but before Event-B translation).
+
* a compatibility table defines allowed associativities between groups (it allows to forbid a syntaxic construction like <math>f(x)^{-1}\;</math>
|-
+
: nota: this requirement was added afterwards with consistency in mind.
|Sharedbuffer20100819Tasking
 
|Same as the example project above, but with Event-B model translations. The difference being that this development includes a model of the implementation. These are refinements that include a program counter to describe flow of execution in each task.
 
|-
 
|SharedBuffer20100819Tutorial
 
|A bare project for step 1 of the tutorial.
 
|-
 
|Sharedbuffer20100819Tutorial2
 
|A partially completed tasking development for steps 2 and 3 of the tutorial.
 
|}
 
  
== Preliminaries ==
+
=== Expected Extension Schemes ===
Before further discussion of the modelling aspects, we take a look at the PrettyPrint viewers. The PrettyPrinters make the viewing of IL1 and tasking models easier; it also provides a route to generate source code. The source code can easily be pasted from the IL1 Pretty Printer window into an the Ada source file .
+
We do want to at least define operators of the following form :
==== The PrettyPrint View of a Tasking Development ====
+
* infix : <math>a + b\;</math> or <math>a \vdash b : c\;</math>
To open the Tasking PrettyPrint viewer,
+
* prefix : <math>\neg a\;</math>
* from the top-menu select ''Window/Show View/Other/Tasking Pretty Printer''.
+
* postfix : <math> R^*\;</math>
 +
* closed : <math>\|a\|\;</math>
 +
* parentheses sugar : <math>(a +b) * c\;</math>
 +
* functional postfix : <math> a \langle b \rangle\;</math>
 +
* functional prefix : <math>  \langle b \rangle f\;</math>
 +
* lambda like : <math> \lambda x\mapsto y . P | E\;</math>
 +
* Union like : <math> \Union\{ e \mid P\}</math> or <math> \Union\{ x,y . P \mid e\}</math>
 +
* sets : <math>\{a, b, c + e\}\;</math> or <math>\{ e \mid P\}\;</math> or <math>\{x,y . P \mid e\}\;</math>
  
Note that the Tasking PrettyPrinter may have to be closed when editing the Tasking Development, since it can give rise to exceptions. The PrettyPrinter would need further work to make it robust, however it is intended only as a short-term solution.
+
We also like to define precisely precedences and associativity between existing and new operators.
  
* Open the ''SharedBuffer20100819Demo'' Project and switch to the Resource Perspective.
+
=== Requirements exported by the dynamic feature ===
* Open the ''.tasking'' model and inspect it. Clicking on the Main, Machine or Event nodes updates the pretty print window.
+
* the precedence should not be enumerated but defined by a relation, like: '+' < '*' and '*' < '**', ...
  
==== Viewing Source Code ====
+
== Limitations ==
aka. The PrettyPrint View of an IL1 Model.
 
  
To view Ada source code,
+
== Design Alternatives ==
* from the top-menu select ''Window/Show View/Other/IL1 Pretty Printer''.
 
* Open the ''SharedBuffer20100819Demo'' Project and switch to the Resource Perspective.
 
* Open the ''.il1'' model and inspect it. Clicking on the Protected, Main Entry, or Task nodes updates the pretty print window.
 
  
==== Cleaning the Tasking Development ====
+
=== Make Existing Parser Extensible ===
If the ''.tasking'' file has errors, then it may need cleaning. To do this right-click on the ''Main'' node, select ''Epsilon Translation/CleanUp''. If a model has errors it can still be viewed by clicking on the ''Selection'' tab at the bottom of the tasking editor window.
+
The existing parser is a LL recursive descent parser generated by the [http://www.ssw.uni-linz.ac.at/Coco/ Coco/R Compiler Generator], which makes extensive use of pre-computed lookahead, which makes it very difficult to be transformed in a sufficiently enough generic parser.
  
== The Tutorial ==
+
=== Parser Combinator ===
The steps needed to generate code from an Event-B model, in this tutorial, are as follows,
+
:* on [http://en.wikipedia.org/wiki/Parser_combinator wikipedia]. Some implementations are referenced [http://pdos.csail.mit.edu/~baford/packrat/ here].
* Step 1 - Create the tasking development. [http://wiki.event-b.org/index.php/Code_Generation_Tutorial#Creating_The_Tasking_Development Creating the tasking development].
+
:* with functionnal language [http://wiki.portal.chalmers.se/agda/Agda Agda] : ''[http://www.cs.nott.ac.uk/~nad/publications/danielsson-norell-mixfix.pdf Parsing Mixfix Operators]
* Step 2 - Add annotations.
+
:: This paper is interesting in its proposal of using an acyclic graph to define operator precedence.
* Step 3 - Invoke translators.
+
:* with functional language [http://user.cs.tu-berlin.de/~opal/opal-language.html OPAL] : ''[http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=8A8A8B39FFBF1AEF17C782C4822C5AF6?doi=10.1.1.58.655&rep=rep1&type=url&i=0 How To Obtain Powerful Parsers That Are Elegant and Practical]''
  
==== Creating The Tasking Development ====
+
=== Pratt Parser ===
* Change to the Event-B Perspective.
+
:* [http://portal.acm.org/citation.cfm?id=512931 original paper from Vaughan Pratt]
* Open the ''SharedBuffer20100819Tutorial'' Project.
+
:* [http://effbot.org/zone/tdop-index.htm SImple ''Pratt'' parsing in python]
* Select the following Machines: Reader, Writer and Shared.
+
:* [http://javascript.crockford.com/tdop/tdop.html ''Pratt'' parsing in javascript]
* Right-click and select ''Make Tasking Development/Generate Tasking Development''.
+
Note that a ''pratt' parser is essentially a parser for expressions (where expression parts are themselves expressions)... But it may be tweaked to limit the accepted sub-expressions or to enforce some non-expression parts inside an expression. More on that hereafter.
  
The new Tasking Development will not be visible in the Event-B perspective, change to the resource perspective, open and inspect the new ''.tasking'' file. The Tasking Development contains (the EMF representation of) the machines that we wish to provide implementations for. In order to introduce the new concepts we have prepared a partially complete development.  
+
=== Some Existing Extensible Languages ===
 +
* [http://www.chrisseaton.com/katahdin/ Katahdin], and the associated paper [http://www.chrisseaton.com/katahdin/katahdin-thesis.pdf A Programming Language Where the Syntax and Semantics Are Mutable at Runtime] which describes the language and its implementation (in C#).
 +
* [http://itcentre.tvu.ac.uk/~clark/xmf.html XMF] whose implementation seems not be describes in details.
 +
* [http://xlr.sourceforge.net/index.html XLR: Extensible Language and Runtime], which looks like a ''pratt'' parser according to the [http://xlr.svn.sourceforge.net/viewvc/xlr/trunk/xl2/xlr/parser.cpp?revision=805&view=markup sources].
 +
* [http://www.pi-programming.org/What.html <math>\pi</math>] whose implementation seems not be describes in details (but sources availables).
  
Change to the Project ''SharedBuffer20100819Tutorial2'' to begin the next step.
+
== Design Proposal ==
 +
=== Main Concepts ===
 +
;symbol: a symbol is a lexem known by the parser.
 +
;group: each symbol belongs to one and only one group.
 +
;symbol compatibility: a relation telling if inside a group two symbol are compatibles.
 +
:Two symbol are compatibles is they can be parsed without parentheses (see [[Event-B Mathematical Language]] for more on compatibility).
 +
;group compatibility: a relation telling if two groups are compatibles.
 +
:Two symbol of different groups are said compatibles if their group are compatibles.
 +
;symbol associativity: a relation which gives the associative precedence of symbols inside a given group.
 +
;group associativity: a relation which gives the associative precedence of groups.
 +
:Two symbol of different groups have a relative precedence given by the associative precedence of their groups.
 +
;generic parser: the parser which is defined.
 +
: This parser may be called recursively (an infix expression being build upon a left and right sub-expression) and is often the default parser used when building the AST associated with a ''symbol''.
 +
;specific parser: some specific parser, predefined as helper functions to build specific AST.
 +
:Some example of specific parser are the one used to parse and build list of identifiers (for quantified predicates), mapplets lists (for lambda), and so one...
  
==== Providing the Annotations for Implementations ====
+
=== Proposed User Interface ===
* Close any Tasking Pretty Print Viewers that remain open. The incomplete model will give rise to exceptions.
+
''The hereafter proposition does not preclude the final UI to take other (more user friendly forms). But the concepts should probably stay the same''
* Go to the to the Resource Perspective.
 
* Open and inspect the ''.tasking'' machine.
 
  
The ''WriterTsk'' and ''SharedObj'' machines are incomplete. We will take the steps to necessary to provide implementation details.  
+
We propose that the user will be able to create several predefined type of symbols (new types may certainly be further added by mean of eclipse extension points). When defining a symbol, the user can choose the parser used to parse the sub-expressions (default to generic parser, but a specific parser may be used) and can enforce the form of the resulting parsed sub-expression in terms of their AST root type.
  
===== The WriterTsk Machine =====
+
==== Predefined Symbol Builders ====
In the partially complete tutorial project we already identified the ''WriterTsk'' as an ''Auto Task'' Tasking Machine, by adding the ''Auto Task'' extension. ''Auto Tasks'' are tasks that will be declared and defined in the ''Main'' procedure of the implementation. The effect of this is that the ''Auto Tasks'' are created when the program first loads, and then activated (made ready to run) before the ''Main'' procedure body runs. We have added the ''Periodic Task'' extension to the ''Auto Task'', and set a period of 250 milliseconds. We will now complete the sequence that has been partially defined in the task body.
+
The proposed predefined symbol builders are:
 +
;infix: For example predicate conjunction which expects two predicates for left and right parts may be defined by <code>infix("∧", gid="(logic pred)", expected_nonterm=["predicate", "predicate"])</code>
 +
:As another example, substitution ≔ of group '(infix subst)' which expect two ''ident list'' for left and right expressions may be defined by <code>infix(symbol="≔", gid="(infix subst)", parsers=["ident_list", "ident_list"], expected_nonterm=["ident_list", "ident_list"])</code>
  
*'''Add Synchronisation between TWrite and SWrite'''.
+
;prefix: <code>prefix("¬", "(not pred)", expected_nonterm=["predicate"])</code>
** Expand the ''Auto Task Machine'' node.
 
** Expand the ''Seq'' sub-tree.
 
** Right-click on the ''Seq'' node and select ''New Child/Left Branch EventWrapper''.
 
** Provide the event label ''w1'' using the properties view.
 
** Right-click on Event Wrapper and select ''New Child/ Synch Events''.
 
** Select ''Synch Events'' and go to the drop-down menu of the ''Local Event'' property.
 
** At this point the drop-down box displays a number of event names, select the ''TWrite'' event.
 
** Go to the drop-down menu of the ''Remote Event'' property.
 
** From the list of events select the ''SWrite'' event.
 
  
The Synch Events construct is used to implement [http://wiki.event-b.org/index.php/Tasking_Event-B_Overview#Control_Constructs Event Synchronisation]. The next step wraps an event in an Event Wrapper in order to update the local state; there is no synchronisation as such but we will re-use the constructs that already exist.
+
;atomic: <code>atomic("⊤", "(atomic pred)")</code>
  
*'''Add the Wrapped Event TcalcWVal'''.
+
;prefix closed: used to build prefix symbol which must be followed by enclosing symbol(brackets, parentheses,..). For example <tt>finite</tt> may be defined by: <code>prefix_closed("finite", "(", ")","(atomic pred)", expected_nonterm=["expression"])</code>
** Expand the sub-tree of the second ''Seq'' node.
 
** Right-click on the ''Seq'' node and select ''New Child/Left Branch EventWrapper''.
 
** Provide the event label ''w2'' using the properties view.
 
** Right-click on Event Wrapper and select ''New Child/ Synch Events''.
 
** Select ''Synch Events'' and go to the drop-down menu of the ''Local Event'' property.
 
** From the list of events select the ''TcalcWVal'' event.
 
  
We have now completed the task body, and it just remains to complete provide details for the ''TWrite'' event. The ''TWrite'' event in ''WriterTsk'' is to be synchronized with the ''SWrite'' event in the ''SharedObj''.
+
:postfix: <code>postfix("∼", gid="(unary relation)",  expected_nonterm=["expression"])</code>
*'''Add Event Extensions'''.
 
** Right-click on the ''TWrite'' Event node.
 
** Select ''New Child/Extension''.
 
** Right-click on the ''Extension'' node and select ''New Child/Implementation'' from the menu.
 
** Go to the Implementation properties view and set the ''Implementation Type'' property to ''ProcedureSynch''.
 
  
*'''Identify Incoming and Outgoing parameters'''.
+
;quantified: <code>quantified("∀", "·", expected_nonterm=["predicate"]</code>
** Right-click on the ''outAP'' node and add an ''Extension''.
 
** Right-click on the ''Extension'' and select''New Child/Parameter Type''.
 
** Go to the ''Parameter Type'' properties view and set the ''Parameter Type'' property to ''actualOut''.
 
** Right-click on the ''inAP'' node and add an ''Extension''.
 
** Right-click on the ''Extension'' and select''New Child/Parameter Type''.
 
** Go to the ''Parameter Type'' properties view and set the ''Parameter Type'' property to ''actualIn''.
 
  
===== The Shared Machine =====
+
;lambda like: <code>lambda_like("λ", "·", u"∣", gid="(quantification)", expected_nonterm=["ident list", "predicate", "expression"])</code>
  
The next step is to identify the ''SharedObj'' machine as a ''Shared Machine''. The ''SharedObj'' Machine will be extended using the Event-B EMF extension mechanism.
+
;quantified union like: <code>Union_like("⋃", "·", "∣", gid="(quantification)")</code>
* Right-click on the ''SharedObj'' Machine node in the ''.tasking'' file.
 
* Select ''New Child/Extension''.
 
* Right-click on the ''Extension'' node and select ''New Child/Shared Machine'' from the menu.
 
  
We now show how to extend the ''SWrite'' event of the Shared Machine with details about its implementation. The ''SWrite'' event in ''SharedObj'' is to be synchronized with the ''TWrite'' event in the ''WriterTsk''.
+
;closed sugar: use for parentheses added to enforce associativity: <code>closed_sugar("(", ")")</code>
* '''Identify SWrite as a Syncronisation'''.
 
** Right-click on the ''SWrite'' Event node.
 
** Select ''New Child/Extension''.
 
** Right-click on the ''Extension'' node and select ''New Child/Implementation'' from the menu.
 
** Go to the Implementation properties view and set the ''Implementation Type'' property to ''ProcedureSynch''.
 
  
* '''Identify incoming and outgoing parameters'''.
+
;functional postfix: <code>functional_postfix("(", ")", gid="(functional)", expected_nonterm=["expression", "expression"])</code>
** Right-click on the ''inFP'' node and add an ''Extension''.
 
** Right-click on the ''Extension'' and select''New Child/Parameter Type''.
 
** Go to the ''Parameter Type'' properties view and set the ''Parameter Type'' property to ''formalIn''.
 
** Right-click on the ''outFP'' node and add an ''Extension''.
 
** Right-click on the ''Extension'' and select''New Child/Parameter Type''.
 
** Go to the ''Parameter Type'' properties view and set the ''Parameter Type'' property to ''formalOut''.
 
  
To summarise, for a Shared Machine definition:
+
Some right associative versions of those symbol builders may also be supplied (infix_right, or a specific parameter ''right associative'' parameter may also be provided to the ''infix'' symbol builder).
# Add the ''SharedMachine'' Machine type.
+
 
# For each event, define the Event Type.
+
==== Predefined Specific Parser ====
# For each event parameter, define the Parameter Type.
+
* expression lists
 +
* ident lists
 +
* maplet tree (for lambda)
 +
 
 +
 
 +
==== Defining Compatibility and Precedences ====
 +
Inside a group, ''compatibility'' may be defined like:
 +
<code>
 +
    g = group["(binop)"]
 +
    g.add_many_compatibility(
 +
            [  ("∪", "∪"),
 +
                ("∩", "∩"),
 +
                ("∩", "∖"),
 +
                ("∩", "▷"),
 +
                ("∩", "⩥"),
 +
                ("∘", "∘"),
 +
                (";", ";"),
 +
                (";", "▷"),
 +
                (";", "⩥"),
 +
                ("", ""),
 +
                ("◁", "∩"),
 +
                ("◁", "∖"),
 +
                ("◁", ";"),
 +
                ("◁", "⊗"),
 +
                ("◁", "▷"),
 +
                ("◁", "⩥"),
 +
                ("⩥", "∩"),
 +
                ("⩥", "∖"),
 +
                ("⩥", ";"),
 +
                ("⩥", "⊗"),
 +
                ("⩥", "▷"),
 +
                ("⩥", "⩥"),
 +
                ("×", "×"), ]
 +
</code>
 +
which defines the compatibility table for expression binary operator as defined in [[Event-B Mathematical Language]].
 +
 
 +
 
 +
Inside a group, ''precedence'' may be defined like:
 +
<code>
 +
g = group["(arithmetic)"]
 +
g.add_many_precedence(
 +
            [  ("^", "÷"),
 +
                ("^", "∗"),
 +
                ("^", "+"),
 +
                ("^", "−"),
 +
                ("^", "mod"),
 +
                ("+", "∗"),
 +
                ("−", "∗"),
 +
                ("+", "÷"),
 +
                ("−", "÷"),
 +
                ("+", "mod"),
 +
                ("−", "mod"), ] )
 +
</code>
 +
which defines ^ as having the least precedence amongst the operators of the expression arithmetic group.
 +
 
 +
Similarly, precedences and compatibilities between groups may be defined :
 +
<code>
 +
    group_precedence.add(("(binop)", "(interval)"))
 +
    group_precedence.add(("(interval)", "(arithmetic)"))
 +
    group_precedence.add(("(arithmetic)", "(unary relation)"))
 +
    group_precedence.add(("(unary relation)", "(bound unary)"))
 +
    group_precedence.add(( "(bound unary)", "(bool)"))
 +
 
 +
    group_compatibility.add_all( [ "(binop)", "(interval)", (arithmetic)", "(unary relation)", "(unary relation)", "(bound unary)",....])
 +
</code>
 +
 
 +
==== Some Difficulties ====
 +
* different builders for the same symbol
 +
* common prefixes {{TODO| backtracking + memoisation}}
 +
* Grammar testing {{TODO| should be include in the user interface: the user shall provide tests with its extensions to ensure that he agree with the parser on the semantic of its extension}}
 +
 
 +
=== Core Algorithm ===
 +
=== Sample Implementation ===
 +
A prototype as been developed in Python to quickly try different solutions.  
 +
 
 +
The development tree is available at http://bitbucket.org/matclab/eventb_pratt_parser/
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
[[Category:Design Proposal]]

Revision as of 22:16, 30 January 2010

This page describes the requirements for a generic parser for the Event-B Mathematical Language.

A first design proposal is also drafted.

Requirements

In order to be usable mathematical extensions require that the event-b mathematical language syntax can be extended by the final user.

The lexical analyser and the syntaxic parser thus have to be extensible in a simple enough way (from a user point of vue).

Requirements Exported by the Current Language Design

Operator Priority

  • operator are defined by group,
  • group of operator have a defined precedences,
  • there may be precedences defined inside groups.

Operator Compatibility

  • a compatibility table defines allowed associativities inside a group,
  • a compatibility table defines allowed associativities between groups (it allows to forbid a syntaxic construction like f(x)^{-1}\;
nota: this requirement was added afterwards with consistency in mind.

Expected Extension Schemes

We do want to at least define operators of the following form :

  • infix : a + b\; or a \vdash b : c\;
  • prefix : \neg a\;
  • postfix :  R^*\;
  • closed : \|a\|\;
  • parentheses sugar : (a +b) * c\;
  • functional postfix :  a \langle b \rangle\;
  • functional prefix :   \langle b \rangle f\;
  • lambda like :  \lambda x\mapsto y . P | E\;
  • Union like :  \Union\{ e \mid P\} or  \Union\{ x,y . P \mid e\}
  • sets : \{a, b, c + e\}\; or \{ e \mid P\}\; or \{x,y . P \mid e\}\;

We also like to define precisely precedences and associativity between existing and new operators.

Requirements exported by the dynamic feature

  • the precedence should not be enumerated but defined by a relation, like: '+' < '*' and '*' < '**', ...

Limitations

Design Alternatives

Make Existing Parser Extensible

The existing parser is a LL recursive descent parser generated by the Coco/R Compiler Generator, which makes extensive use of pre-computed lookahead, which makes it very difficult to be transformed in a sufficiently enough generic parser.

Parser Combinator

This paper is interesting in its proposal of using an acyclic graph to define operator precedence.

Pratt Parser

Note that a pratt' parser is essentially a parser for expressions (where expression parts are themselves expressions)... But it may be tweaked to limit the accepted sub-expressions or to enforce some non-expression parts inside an expression. More on that hereafter.

Some Existing Extensible Languages

Design Proposal

Main Concepts

symbol
a symbol is a lexem known by the parser.
group
each symbol belongs to one and only one group.
symbol compatibility
a relation telling if inside a group two symbol are compatibles.
Two symbol are compatibles is they can be parsed without parentheses (see Event-B Mathematical Language for more on compatibility).
group compatibility
a relation telling if two groups are compatibles.
Two symbol of different groups are said compatibles if their group are compatibles.
symbol associativity
a relation which gives the associative precedence of symbols inside a given group.
group associativity
a relation which gives the associative precedence of groups.
Two symbol of different groups have a relative precedence given by the associative precedence of their groups.
generic parser
the parser which is defined.
This parser may be called recursively (an infix expression being build upon a left and right sub-expression) and is often the default parser used when building the AST associated with a symbol.
specific parser
some specific parser, predefined as helper functions to build specific AST.
Some example of specific parser are the one used to parse and build list of identifiers (for quantified predicates), mapplets lists (for lambda), and so one...

Proposed User Interface

The hereafter proposition does not preclude the final UI to take other (more user friendly forms). But the concepts should probably stay the same

We propose that the user will be able to create several predefined type of symbols (new types may certainly be further added by mean of eclipse extension points). When defining a symbol, the user can choose the parser used to parse the sub-expressions (default to generic parser, but a specific parser may be used) and can enforce the form of the resulting parsed sub-expression in terms of their AST root type.

Predefined Symbol Builders

The proposed predefined symbol builders are:

infix
For example predicate conjunction which expects two predicates for left and right parts may be defined by infix("∧", gid="(logic pred)", expected_nonterm=["predicate", "predicate"])
As another example, substitution ≔ of group '(infix subst)' which expect two ident list for left and right expressions may be defined by infix(symbol="≔", gid="(infix subst)", parsers=["ident_list", "ident_list"], expected_nonterm=["ident_list", "ident_list"])
prefix
prefix("¬", "(not pred)", expected_nonterm=["predicate"])
atomic
atomic("⊤", "(atomic pred)")
prefix closed
used to build prefix symbol which must be followed by enclosing symbol(brackets, parentheses,..). For example finite may be defined by: prefix_closed("finite", "(", ")","(atomic pred)", expected_nonterm=["expression"])
postfix: postfix("∼", gid="(unary relation)", expected_nonterm=["expression"])
quantified
quantified("∀", "·", expected_nonterm=["predicate"]
lambda like
lambda_like("λ", "·", u"∣", gid="(quantification)", expected_nonterm=["ident list", "predicate", "expression"])
quantified union like
Union_like("⋃", "·", "∣", gid="(quantification)")
closed sugar
use for parentheses added to enforce associativity: closed_sugar("(", ")")
functional postfix
functional_postfix("(", ")", gid="(functional)", expected_nonterm=["expression", "expression"])

Some right associative versions of those symbol builders may also be supplied (infix_right, or a specific parameter right associative parameter may also be provided to the infix symbol builder).

Predefined Specific Parser

  • expression lists
  • ident lists
  • maplet tree (for lambda)


Defining Compatibility and Precedences

Inside a group, compatibility may be defined like:

   g = group["(binop)"]
   g.add_many_compatibility(
           [   ("∪", "∪"),
               ("∩", "∩"),
               ("∩", "∖"),
               ("∩", "▷"),
               ("∩", "⩥"),
               ("∘", "∘"),
               (";", ";"),
               (";", "▷"),
               (";", "⩥"),
               ("", ""),
               ("◁", "∩"),
               ("◁", "∖"),
               ("◁", ";"),
               ("◁", "⊗"),
               ("◁", "▷"),
               ("◁", "⩥"),
               ("⩥", "∩"),
               ("⩥", "∖"),
               ("⩥", ";"),
               ("⩥", "⊗"),
               ("⩥", "▷"),
               ("⩥", "⩥"),
               ("×", "×"), ]

which defines the compatibility table for expression binary operator as defined in Event-B Mathematical Language.


Inside a group, precedence may be defined like:

g = group["(arithmetic)"]
g.add_many_precedence(
           [   ("^", "÷"),
               ("^", "∗"),
               ("^", "+"),
               ("^", "−"),
               ("^", "mod"),
               ("+", "∗"),
               ("−", "∗"),
               ("+", "÷"),
               ("−", "÷"),
               ("+", "mod"),
               ("−", "mod"), ] )

which defines ^ as having the least precedence amongst the operators of the expression arithmetic group.

Similarly, precedences and compatibilities between groups may be defined :

   group_precedence.add(("(binop)", "(interval)"))
   group_precedence.add(("(interval)", "(arithmetic)"))
   group_precedence.add(("(arithmetic)", "(unary relation)"))
   group_precedence.add(("(unary relation)", "(bound unary)"))
   group_precedence.add(( "(bound unary)", "(bool)"))
   group_compatibility.add_all( [ "(binop)", "(interval)", (arithmetic)", "(unary relation)", "(unary relation)", "(bound unary)",....])

Some Difficulties

  • different builders for the same symbol
  • common prefixes TODO: backtracking + memoisation
  • Grammar testing TODO: should be include in the user interface: the user shall provide tests with its extensions to ensure that he agree with the parser on the semantic of its extension

Core Algorithm

Sample Implementation

A prototype as been developed in Python to quickly try different solutions.

The development tree is available at http://bitbucket.org/matclab/eventb_pratt_parser/