Difference between pages "Constrained Dynamic Parser" and "Creating a new plugin using eclipse (How to extend Rodin Tutorial)"

Latest revision as of 14:48, 23 August 2010

Redirect to:

Creating a new plug-in using eclipse (How to extend Rodin Tutorial)

@@ Line 1: / Line 1: @@
-{{TOCright}}
+#REDIRECT [[Creating a new plug-in using eclipse (How to extend Rodin Tutorial)]]
-This page describes the requirements for a '''generic parser''' for the [[Event-B Mathematical Language]].
-A first design proposal is also drafted.
-== Requirements ==
-In order to be usable [[mathematical extensions]] require that the event-b mathematical language syntax can be extended by the final user.
-The lexical analyser and the syntaxic parser thus have to be extensible in a simple enough way (from a user point of vue).
-=== Requirements Exported by the Current Language Design ===
-==== Operator Priority ====
-* operator are defined by group,
-* group of operator have a defined precedences,
-* there may be precedences defined inside groups.
-==== Operator Compatibility ====
-* a compatibility table defines allowed associativities inside a group,
-* a compatibility table defines allowed associativities between groups (it allows to forbid a syntaxic construction like <math>f(x)^{-1}\;</math>
-: nota: this requirement was added afterwards with consistency in mind.
-=== Expected Extension Schemes ===
-We do want to at least define operators of the following form :
-* infix : <math>a + b\;</math> or <math>a \vdash b : c\;</math>
-* prefix : <math>\neg a\;</math>
-* postfix : <math> R^*\;</math>
-* closed : <math>\|a\|\;</math>
-* parentheses sugar : <math>(a +b) * c\;</math>
-* functional postfix : <math> a \langle b \rangle\;</math>
-* functional prefix : <math>  \langle b \rangle f\;</math>
-* lambda like : <math> \lambda x\mapsto y . P | E\;</math>
-* Union like : <math> \Union\{ e \mid P\}</math> or <math> \Union\{ x,y . P \mid e\}</math>
-* sets : <math>\{a, b, c + e\}\;</math> or <math>\{ e \mid P\}\;</math> or <math>\{x,y . P \mid e\}\;</math>
-We also like to define precisely precedences and associativity between existing and new operators.
-=== Requirements exported by the dynamic feature ===
-* the precedence should not be enumerated but defined by a relation, like: '+' < '*' and '*' < '**', ...
-== Limitations ==
-== Design Alternatives ==
-=== Make Existing Parser Extensible ===
-The existing parser is a LL recursive descent parser generated by the [http://www.ssw.uni-linz.ac.at/Coco/ Coco/R Compiler Generator], which makes extensive use of pre-computed lookahead, which makes it very difficult to be transformed in a sufficiently enough generic parser.
-=== Parser Combinator ===
-:* on [http://en.wikipedia.org/wiki/Parser_combinator wikipedia]. Some implementations are referenced [http://pdos.csail.mit.edu/~baford/packrat/ here].
-:* with functionnal language [http://wiki.portal.chalmers.se/agda/Agda Agda] : ''[http://www.cs.nott.ac.uk/~nad/publications/danielsson-norell-mixfix.pdf Parsing Mixfix Operators]
-:: This paper is interesting in its proposal of using an acyclic graph to define operator precedence.
-:* with functional language [http://user.cs.tu-berlin.de/~opal/opal-language.html OPAL] : ''[http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=8A8A8B39FFBF1AEF17C782C4822C5AF6?doi=10.1.1.58.655&rep=rep1&type=url&i=0 How To Obtain Powerful Parsers That Are Elegant and Practical]''
-=== Pratt Parser ===
-:* [http://portal.acm.org/citation.cfm?id=512931 original paper from Vaughan Pratt]
-:* [http://effbot.org/zone/tdop-index.htm SImple ''Pratt'' parsing in python]
-:* [http://javascript.crockford.com/tdop/tdop.html ''Pratt'' parsing in javascript]
-Note that a ''pratt' parser is essentially a parser for expressions (where expression parts are themselves expressions)... But it may be tweaked to limit the accepted sub-expressions or to enforce some non-expression parts inside an expression. More on that hereafter.
-=== Some Existing Extensible Languages ===
-* [http://www.chrisseaton.com/katahdin/ Katahdin], and the associated paper [http://www.chrisseaton.com/katahdin/katahdin-thesis.pdf A Programming Language Where the Syntax and Semantics Are Mutable at Runtime] which describes the language and its implementation (in C#).
-* [http://itcentre.tvu.ac.uk/~clark/xmf.html XMF] whose implementation seems not be describes in details.
-* [http://xlr.sourceforge.net/index.html XLR: Extensible Language and Runtime], which looks like a ''pratt'' parser according to the [http://xlr.svn.sourceforge.net/viewvc/xlr/trunk/xl2/xlr/parser.cpp?revision=805&view=markup sources].
-* [http://www.pi-programming.org/What.html <math>\pi</math>] whose implementation seems not be describes in details (but sources availables).
-== Design Proposal ==
-=== Main Concepts ===
-;symbol: a symbol is a lexem known by the parser.
-;group: each symbol belongs to one and only one group.
-;symbol compatibility: a relation telling if inside a group two symbol are compatibles.
-:Two symbol are compatibles is they can be parsed without parentheses (see [[Event-B Mathematical Language]] for more on compatibility).
-;group compatibility: a relation telling if two groups are compatibles.
-:Two symbol of different groups are said compatibles if their group are compatibles.
-;symbol associativity: a relation which gives the associative precedence of symbols inside a given group.
-;group associativity: a relation which gives the associative precedence of groups.
-:Two symbol of different groups have a relative precedence given by the associative precedence of their groups.
-;generic parser: the parser which is defined.
-: This parser may be called recursively (an infix expression being build upon a left and right sub-expression) and is often the default parser used when building the AST associated with a ''symbol''.
-;specific parser: some specific parser, predefined as helper functions to build specific AST.
-:Some example of specific parser are the one used to parse and build list of identifiers (for quantified predicates), mapplets lists (for lambda), and so one...
-=== Proposed User Interface ===
-''The hereafter proposition does not preclude the final UI to take other (more user friendly forms). But the concepts should probably stay the same''
-We propose that the user will be able to create several predefined type of symbols (new types may certainly be further added by mean of eclipse extension points). When defining a symbol, the user can choose the parser used to parse the sub-expressions (default to generic parser, but a specific parser may be used) and can enforce the form of the resulting parsed sub-expression in terms of their AST root type.
-==== Predefined Symbol Builders ====
-The proposed predefined symbol builders are:
-;infix: For example predicate conjunction which expects two predicates for left and right parts may be defined by <code>infix("∧", gid="(logic pred)", expected_nonterm=["predicate", "predicate"])</code>
-:As another example, substitution ≔ of group '(infix subst)' which expect two ''ident list'' for left and right expressions may be defined by <code>infix(symbol="≔", gid="(infix subst)", parsers=["ident_list", "ident_list"], expected_nonterm=["ident_list", "ident_list"])</code>
-;prefix: <code>prefix("¬", "(not pred)", expected_nonterm=["predicate"])</code>
-;atomic: <code>atomic("⊤", "(atomic pred)")</code>
-;prefix closed: used to build prefix symbol which must be followed by enclosing symbol(brackets, parentheses,..). For example <tt>finite</tt> may be defined by: <code>prefix_closed("finite", "(", ")","(atomic pred)", expected_nonterm=["expression"])</code>
-:postfix: <code>postfix("∼", gid="(unary relation)",  expected_nonterm=["expression"])</code>
-;quantified: <code>quantified("∀", "·", expected_nonterm=["predicate"]</code>
-;lambda like: <code>lambda_like("λ", "·", u"∣", gid="(quantification)", expected_nonterm=["ident list", "predicate", "expression"])</code>
-;quantified union like: <code>Union_like("⋃", "·", "∣", gid="(quantification)")</code>
-;closed sugar: use for parentheses added to enforce associativity: <code>closed_sugar("(", ")")</code>
-;functional postfix: <code>functional_postfix("(", ")", gid="(functional)", expected_nonterm=["expression", "expression"])</code>
-Some right associative versions of those symbol builders may also be supplied (infix_right, or a specific parameter ''right associative'' parameter may also be provided to the ''infix'' symbol builder).
-==== Predefined Specific Parser ====
-* expression lists
-* ident lists
-* maplet tree (for lambda)
-==== Defining Compatibility and Precedences ====
-Inside a group, ''compatibility'' may be defined like:
-<code>
-    g = group["(binop)"]
-    g.add_many_compatibility(
-            [   ("∪", "∪"),
-                ("∩", "∩"),
-                ("∩", "∖"),
-                ("∩", "▷"),
-                ("∩", "⩥"),
-                ("∘", "∘"),
-                (";", ";"),
-                (";", "▷"),
-                (";", "⩥"),
-                ("", ""),
-                ("◁", "∩"),
-                ("◁", "∖"),
-                ("◁", ";"),
-                ("◁", "⊗"),
-                ("◁", "▷"),
-                ("◁", "⩥"),
-                ("⩥", "∩"),
-                ("⩥", "∖"),
-                ("⩥", ";"),
-                ("⩥", "⊗"),
-                ("⩥", "▷"),
-                ("⩥", "⩥"),
-                ("×", "×"), ]
-</code>
-which defines the compatibility table for expression binary operator as defined in [[Event-B Mathematical Language]].
-Inside a group, ''precedence'' may be defined like:
-<code>
- g = group["(arithmetic)"]
- g.add_many_precedence(
-            [   ("^", "÷"),
-                ("^", "∗"),
-                ("^", "+"),
-                ("^", "−"),
-                ("^", "mod"),
-                ("+", "∗"),
-                ("−", "∗"),
-                ("+", "÷"),
-                ("−", "÷"),
-                ("+", "mod"),
-                ("−", "mod"), ] )
-</code>
-which defines ^ as having the least precedence amongst the operators of the expression arithmetic group.
-Similarly, precedences and compatibilities between groups may be defined :
-<code>
-    group_precedence.add(("(binop)", "(interval)"))
-    group_precedence.add(("(interval)", "(arithmetic)"))
-    group_precedence.add(("(arithmetic)", "(unary relation)"))
-    group_precedence.add(("(unary relation)", "(bound unary)"))
-    group_precedence.add(( "(bound unary)", "(bool)"))
-    group_compatibility.add_all( [ "(binop)", "(interval)", (arithmetic)", "(unary relation)", "(unary relation)", "(bound unary)",....])
-</code>
-==== Some Difficulties ====
-* different builders for the same symbol
-* common prefixes {{TODO| backtracking + memoisation}}
-* Grammar testing {{TODO| should be include in the user interface: the user shall provide tests with its extensions to ensure that he agree with the parser on the semantic of its extension}}
-=== Core Algorithm ===
-=== Sample Implementation ===
-A prototype as been developed in Python to quickly try different solutions.
-The development tree is available at http://bitbucket.org/matclab/eventb_pratt_parser/
-[[Category:Design Proposal]]

Difference between pages "Constrained Dynamic Parser" and "Creating a new plugin using eclipse (How to extend Rodin Tutorial)"

Latest revision as of 14:48, 23 August 2010

Navigation menu

Page actions

Page actions

Personal tools

Navigation

contribute

Search

Tools