# Difference between revisions of "Constrained Dynamic Parser"

imported>Mathieu |
imported>Mathieu m |
||

Line 54: | Line 54: | ||

:* [http://effbot.org/zone/tdop-index.htm SImple ''Pratt'' parsing in python] | :* [http://effbot.org/zone/tdop-index.htm SImple ''Pratt'' parsing in python] | ||

:* [http://javascript.crockford.com/tdop/tdop.html ''Pratt'' parsing in javascript] | :* [http://javascript.crockford.com/tdop/tdop.html ''Pratt'' parsing in javascript] | ||

+ | Note that a ''pratt' parser is essentially a parser for expressions (where expression parts are themselves expressions)... But it may be tweaked to limit the accepted sub-expressions or to enforce some non-expression parts inside an expression. More on that hereafter. | ||

=== Some Existing Extendable Languages === | === Some Existing Extendable Languages === | ||

Line 66: | Line 67: | ||

;group: each symbol belongs to one and only one group. | ;group: each symbol belongs to one and only one group. | ||

;symbol compatibility: a relation telling if inside a group two symbol are compatibles. | ;symbol compatibility: a relation telling if inside a group two symbol are compatibles. | ||

− | :Two symbol are compatibles is they can be parsed without parentheses. | + | :Two symbol are compatibles is they can be parsed without parentheses (see [[Event-B Mathematical Language]] for more on compatibility). |

− | ;group compatibility: a | + | ;group compatibility: a relation telling if two groups are compatibles. |

− | ;group associativity: a | + | :Two symbol of different groups are said compatibles if their group are compatibles. |

− | ;symbol | + | ;symbol associativity: a relation which gives the associative precedence of symbols inside a given group. |

+ | ;group associativity: a relation which gives the associative precedence of groups. | ||

+ | :Two symbol of different groups have a relative precedence given by the associative precedence of their groups. | ||

+ | ;generic parser: the parser which is defined. | ||

+ | : This parser may be called recursively (an infix expression being build upon a left and right sub-expression) and is often the default parser used when building the AST associated with a ''symbol''. | ||

+ | ;specific parser: some specific parser, predefined as helper functions to build specific AST. | ||

+ | :Some example of specific parser are the one used to parse and build list of identifiers (for quantified predicates), mapplets lists (for lambda), and so one... | ||

+ | |||

=== Proposed User Interface === | === Proposed User Interface === | ||

− | The user will be able to create | + | The user will be able to create the following symbols: |

+ | ;infix: | ||

## Revision as of 21:18, 30 January 2010

This page describes the requirements for a **generic parser** for the Event-B Mathematical Language.
It will also draft a first design proposal.

## Requirements

In order to be usable mathematical extensions require the event-b mathematical language syntax to be extendable by the final user.

Thus, the lexical analyser and the syntaxic parser have to be extendable in a simple enough way (from a user point of vue).

### Requirements Exported by the Current Language Design

#### Operator Priority

- operator are defined by group,
- group of operator have a defined precedences,
- there may be precedences defined inside groups.

#### Operator Associativity

- a compatibility table defines allowed associativity inside a group,
- a compatibility tables defines allowed associativity between groups (it allows to forbid a syntaxic construction like

- nota: this requirement was added afterwards with consistency in mind.

### Expected Extension Schemes

We do want to at least define operators of the following form :

- infix : or
- prefix :
- postfix :
- closed :
- parentheses sugar :
- functional postfix :
- functional prefix :
- lambda like :
- Union like : or
- sets : or or

We also like to define precisely precedences and associativity between existing and new operators.

### Requirements exported by the dynamic feature

- the precedence should not be enumerated but defined by a relation, like: '+' < '*' and '*' < '**', ...

## Limitations

## Design Alternatives

### Make Existing Parser Extendable

The existing parser is a LL recursive descent parser generated by the Coco/R Compiler Generator, which makes extensive use of pre-computed lookahead, which makes it very difficult to be transformed in a sufficiently enough generic parser.

### Parser Combinator

- on wikipedia. Some implementations are referenced [here http://pdos.csail.mit.edu/~baford/packrat/].
- with functionnal language Agda[1] :
*Parsing Mixfix Operators*

- This paper is interesting in its proposal of using an acyclic graph to define operator precedence.

- with functional language OPAL :
*How To Obtain Powerful Parsers That Are Elegant and Practical*

### Pratt Parser

Note that a *pratt' parser is essentially a parser for expressions (where expression parts are themselves expressions)... But it may be tweaked to limit the accepted sub-expressions or to enforce some non-expression parts inside an expression. More on that hereafter.*

### Some Existing Extendable Languages

- Katahdin, and the associated paper A Programming Language Where the Syntax and Semantics Are Mutable at Runtime which describes the language and its implementation (in C#).
- XMF whose implementation seems not be describes in details.
- XLR: Extensible Language and Runtime, which looks like a
*pratt*parser according to the sources. - whose implementation seems not be describes in details (but sources availables).

## Design Proposal

### Main Concepts

- symbol
- a symbol is a lexem known by the parser.
- group
- each symbol belongs to one and only one group.
- symbol compatibility
- a relation telling if inside a group two symbol are compatibles.
- Two symbol are compatibles is they can be parsed without parentheses (see Event-B Mathematical Language for more on compatibility).
- group compatibility
- a relation telling if two groups are compatibles.
- Two symbol of different groups are said compatibles if their group are compatibles.
- symbol associativity
- a relation which gives the associative precedence of symbols inside a given group.
- group associativity
- a relation which gives the associative precedence of groups.
- Two symbol of different groups have a relative precedence given by the associative precedence of their groups.
- generic parser
- the parser which is defined.
- This parser may be called recursively (an infix expression being build upon a left and right sub-expression) and is often the default parser used when building the AST associated with a
*symbol*. - specific parser
- some specific parser, predefined as helper functions to build specific AST.
- Some example of specific parser are the one used to parse and build list of identifiers (for quantified predicates), mapplets lists (for lambda), and so one...

### Proposed User Interface

The user will be able to create the following symbols:

- infix

### Core Algorithm

### Sample Implementation

A prototype as been developed in Python to quickly try different solutions.

The development tree is available at http://bitbucket.org/matclab/eventb_pratt_parser/