Difference between pages "Mathematical Extensions" and "Mathematical Language Evolution Design"

From Event-B
(Difference between pages)
Jump to navigationJump to search
imported>Pascal
 
imported>Nicolas
 
Line 1: Line 1:
Currently the operators and basic predicates of the Event-B mathematical language supported by Rodin are fixed.
+
== Introduction ==
We propose to extend Rodin to define basic predicates, new operators or new algebraic types.
 
  
== Requirements ==
+
This document describes the modifications that occur to the Event-B mathematical language from the developer point of view.
 +
The old language will still be recognized, in order to provide backward compatibility.
  
=== User Requirements ===
+
These changes are described from a user point of view in [[Changes to the Mathematical Language of Event-B]].
* Binary operators (prefix form, infix form or suffix form).
 
* Operators on boolean expressions.
 
* Unary operators, such as absolute values.
 
: Note that the pipe, which is already used for set comprehension, cannot be used to enter absolute values (''in fact, as in the new design the pipe used in set comprehension is only a syntaxic sugar, the same symbol may be used for absolute value. To be confirmed with the prototyped parser. It may however be not allowed in a first time until backtracking is implemented, due to use of lookahead making assumptions on how the pipe symbol is used. -Mathieu ).
 
* Basic predicates (e.g., the symmetry of relations <math>sym(R) \defi R=R^{-1}</math>).
 
: Having a way to enter such predicates may be considered as syntactic sugar, because it is already possible to use sets (e.g., <math>R \in sym</math>, where <math>sym \defi \{R \mid R=R ^{-1}\}</math>) or functions (e.g., <math>sym(R) = \True</math>, where <math>sym \defi (\lambda R \qdot R \in A \rel B \mid \bool(R = R^{-1}))</math>).
 
* Quantified expressions (e.g., <math>\sum x \qdot P \mid y</math>, <math>\prod x \qdot P \mid y</math>, <math>~\min(S)</math>, <math>~\max(S)</math>).
 
* Types.
 
** Enumerated types.
 
** Scalar types.
 
  
=== User Input ===
+
== Abstract Syntax Tree ==
The end-user shall provide the following information:
 
* keyboard input
 
* Lexicon and Syntax. <br/>More precisely, it includes the symbols, the form (prefix, infix, postfix), the grammar, associativity (left-associative or right associative), commutativity, priority, the mode (flattened or not), ...
 
* Pretty-print. <br/>Alternatively, the rendering may be determined from the notation parameters passed to the parser.
 
* Typing rules.
 
* Well-definedness.
 
  
=== Development Requirements ===
+
Deprecated nodes (becoming atomic expressions):
* Scalability.
+
* KPRJ1
 +
* KPRJ2
 +
* KID
  
== Towards a generic AST ==
+
New nodes:
 +
* KPARTITION that represents the new partition predicate (multiple predicate)
 +
* KPRJ1_GEN that represents the generic atomic version of the first projection
 +
* KPRJ2_GEN that represents the generic atomic version of the second projection
 +
* KID_GEN that represents the generic atomic version of the identity
  
The following AST parts are to become generic, or at least parameterised:
+
=== Parser Changes ===
* [[Constrained_Dynamic_Lexer | Lexer]]
 
* [[Constrained Dynamic Parser | Parser]]
 
* Nodes ( Formula class hierarchy ): parameters needed for:
 
** Type Solve (type rule needed to synthesize the type)
 
** Type Check (type rule needed to verify constraints on children types)
 
** WD (WD predicate)
 
** PrettyPrint (tag image + notation (prefix, infix, postfix) + needs parentheses)
 
** Visit Formula (getting children + visitor callback mechanism)
 
** Rewrite Formula (associative formulæ have a specific flattening treatment)
 
* Types (Type class hierarchy): parameters needed for:
 
** Building the type expression (type rule needed)
 
** PrettyPrint (set operator image)
 
** getting Base / Source / Target type (type rule needed)
 
* Verification of preconditions (see for example <tt>AssociativeExpression.checkPreconditions</tt>)
 
  
=== Vocabulary ===
+
The new parser has a version attribute. Depending on the version, formulas are parsed differently. Thus, the KPARTITION node is a regular identifier if version is V1. It is accepted as a keyword only from version V2.
 +
Concerning identity and projections, they are recognized as non-generic with V1 and generic with V2.
 +
The following table summarizes how tokens are processed depending on the language version:
  
An '''extension''' is to be understood as a single additional operator definition.
 
  
=== Tags ===
+
{| border="4"
 +
|+ Token Recognition
 +
! scope=col |
 +
! scope=col |Language V1
 +
! scope=col |Language V2
 +
|-
 +
! scope=row |"partition"
 +
|IDENT
 +
|KPARTITION
 +
|-
 +
! scope=row |"id"
 +
|KID
 +
|KID_GEN
 +
|-
 +
! scope=row |"prj1"
 +
|KPRJ1
 +
|KPRJ1_GEN
 +
|-
 +
! scope=row |"prj2"
 +
|KPRJ2
 +
|KPRJ2_GEN
 +
|}
  
Every extension is associated with an integer tag, just like existing operators. Thus, questions arise about how to allocate new tags and how to deal with existing tags.<br />
+
==== Parsing ====
The solution proposed here consists in keeping existing tags 'as is'. They are already defined and hard coded in the <tt>Formula</tt> class, so this choice is made with backward compatibility in mind.
 
  
Now concerning extension tags, we will first introduce a few hypotheses:
+
FormulaFactory has new {{method|parse()}} methods with a language version argument. Existing methods default to the first version ({{class|LanguageVersion.V1}}), while becoming deprecated.
* Tags_Hyp1: tags are never persisted across sessions on a given platform
+
Thus, one now has to know, before parsing a formula, the version of the language it has been written with.
* Tags_Hyp2: tags are never referenced for sharing purposes across various platforms
+
It also provides methods to [[#Upgrade|upgrade]] to the new language version.
In other words, cross-platform/session formula references are always made through their ''String'' representation. These assumptions, which were already made and verified for Rodin before extensions, lead us to restrict further considerations to the scope of a single session on a single platform.
 
  
The following definitions hold at a given instant <math>t</math> and for the whole platform.<br />
+
==== Multiple Predicates / Partition ====
Let <math>\mathit{EXTENSION}_t</math> be the set of extensions supported by the platform at instant <math>t</math>;<br /> let <math>\mathit{tag}_t</math> denote the affectation of tags to a extensions at instant <math>t</math> (<math>\mathit{tag}_t \in \mathit{EXTENSION}_t \pfun \intg</math>);<br /> let <math>\mathit{COMMON}</math> be the set of existing tags defined by the <tt>Formula</tt> class (<math>\mathit{COMMON} \sub \intg</math>).<br /> The following requirements emerge:
 
* Tags_Req1: <math>\forall t \qdot \mathit{tag}_t \in \mathit{EXTENSION}_t \tinj \intg</math>
 
* Tags_Req2: <math>\forall e, t_1,t_2 \qdot \mathit{tag}_{t_1}(e)=\mathit{tag}_{t_2}(e)</math> where <math>t_1, t_2</math> are two instants during a given session
 
* Tags_Req3: <math>\forall t \qdot \ran(\mathit{tag}_t) \cap \mathit{COMMON} = \empty</math>
 
  
The above-mentioned scope-restricting hypothesis can be reformulated into: <math>\mathit{tag}</math> needs not be stable across sessions nor across platforms.
+
A new predicate class is introduced: {{class|MultiplePredicate}}, in which the partition predicate is implemented.
 +
A {{class|MultiplePredicate}} is a predicate that applies to a list of one or more expressions.
 +
In a partition, all expressions must have the same type, that is the type of the partitioned set (first child).
  
=== Formula Factory ===
+
==== Associativity ====
  
The requirements about tags give rise to a need for centralising the <math>\mathit{tag}</math> relation in order to enforce tag uniqueness.
+
Newly non associative relational set operators now require parentheses around their operands for parsing to succeed. This is managed directly by the parser.
The Formula Factory appears to be a convenient and logical candidate for playing this role. Each time an extension is used to make a formula, the factory is called and it can check whether its associated tag exists, create it if needed, then return the new extended formula while maintaining tag consistency.
+
Unparsing also requires those parentheses, so the {{class|BinaryExpression}}{{method|.toString()}} method has been changed accordingly.
  
The factory can also provide API for requests about tags and extensions: getting the tag from an extension and conversely.
+
==== Visitors ====
  
We also need additional methods to create extended formulæ. A first problem to address is: which type should these methods return ?
+
Partition, identity and projections also entail new visitor methods, as follows:
We could define as many extended types as the common AST API does, namely <tt>ExtendedUnaryPredicate</tt>, <tt>ExtendedAssociativeExpression</tt>, and so on, but this would lead to a large number of new types to deal with (in visitors, filters, …), together with a constraint about which types extensions would be forced to fit into. It is thus preferable to have as few extended types as possible, but with as much parameterisation as can be. Considering that the two basic types <tt>Expression</tt> and <tt>Predicate</tt> have to be extensible, we come to add two extended types <tt>ExtendedExpression</tt> and <tt>ExtendedPredicate</tt>.
 
  
ExtendedExpression makeExtendedExpression( ? )
+
{{class|IVisitor}} has 6 more methods:
ExtendedPredicate makeExtendedPredicate( ? )
+
* {{method|enterKPARTITION(MultiplePredicate)}}
 +
* {{method|continueKPARTITION(MultiplePredicate)}}
 +
* {{method|exitKPARTITION(MultiplePredicate)}}
 +
* {{method|visitKID_GEN(AtomicExpression)}}
 +
* {{method|visitKPRJ1_GEN(AtomicExpression)}}
 +
* {{method|visitKPRJ2_GEN(AtomicExpression)}}
  
Second problem to address: which arguments should these methods take ?
 
Other factory methods take the tag, a collection of children where applicable, and a source location. In order to discuss what can be passed as argument to make extended formulæ, we have to recall that the <tt>make…</tt> factory methods have various kinds of clients, namely:
 
* parser
 
* POG
 
* provers
 
(other factory clients use the parsing or identifier utility methods: SC modules, indexers, …)
 
  
Thus, the arguments should be convenient for clients, depending on which information they have at hand.
+
{{class|ISimpleVisitor}} has 1 more method:
The source location does not obviously seem to require any adaptation and can be taken as argument the same way. Concerning the tag, it depends on whether clients have a tag or an extension at hand. Both are intended to be easily retrieved from the factory. As a preliminary choice, we can go for the tag and adjust this decision when we know more about client convenience.
+
* {{method|visitMultiplePredicate(MultiplePredicate)}}
  
As for children, the problem is more about their types. We want to be able to handle as many children as needed, and of all possible types. Looking to existing formula children configurations, we can find:
+
Thus, implementors of those interfaces are required to implemented new methods, while subclassers of {{class|DefaultVisitor}} and {{class|DefaultSimpleVisitor}} should consider whether or not to override them.
* expressions with predicate children: <math>\mathit{bool}(P)</math>
 
* expressions with expression children: <math>E_1 + E_2</math>
 
* predicates with predicate children: <math>P_1 \limp P_2</math>
 
* predicates with expression children: <math>\mathit{partition}(S, E_1, E_2)</math>
 
* mixed operators: <math>\{x \qdot P(x) \mid E(x)\}</math>, but it is worth noting that the possibility of introducing bound variables in extended formulæ is not established yet.
 
Thus, for the sake of generality, children of both types should be supported for both extended predicates and extended expressions.
 
  
ExtendedExpression makeExtendedExpression(int tag, Expression[] expressions, Predicate[] predicates, SourceLocation location)
 
ExtendedPredicate makeExtendedPredicate(int tag, Expression[] expressions, Predicate[] predicates, SourceLocation location)
 
  
=== Defining Extensions ===
+
== Upgrade ==
  
An extension is meant to contain every information and behaviour required by:
+
Formulas written with version {{class|V1}} will need upgrading to version {{class|V2}}. However, most of the V1 formulas will be exactly the same with V2.
* Keyboard
+
A formula needs ugrading if:
* (Extensible Fonts ?)
+
* it contains one of {KPRJ1, KPRJ2, KID}
* Lexer
+
* it contains a non parenthesised sequence of relational set operators
* Parser
 
* AST
 
* Static Checker
 
* Proof Obligation Generator
 
* Provers
 
  
==== Keyboard requirements ====
+
In these cases an upgrade procedure will be applied to the formula using a {{class|IFormulaRewriter}}, to rewrite KPRJ1, KPRJ2, KID expressions into their generic equivalent (see [[Changes to the Mathematical Language of Event-B#Generic Identity and Projections]]).
  
'''Kbd_req1''': an extension must provide an association combo/translation for every uncommon symbol involved in the notation.
+
Upgraded formulas can be obtained from a formula string by calling one of the new {{class|FormulaFactory}}{{method|.upgrade()}} methods.
  
==== Lexer requirements ====
+
=== File upgrade ===
  
'''Lex_req1''': an extension must provide an association lexeme/token for every uncommon symbol involved in the notation.
+
Existing rodin files with V1 formulas inside need converting to V2 formulas. Concerned files are:
 +
* Contexts
 +
* Machines
 +
* Proofs
  
==== Parser requirements ====
+
This is managed through the version tools of the rodin core package: a new version number for each of these files represents the passing from V1 to V2 mathematical language.
 +
The following table summarizes the correspondence between file version number and mathematical version number:
  
According to the [[Constrained_Dynamic_Parser| Parser]] page, the following informations are required by the parser in order to be able to parse a formula containing a given extension.
 
  
* symbol compatibility
+
{| border="4"
* group compatibility
+
|+ File / Math versions correspondence
* symbol precedence
+
! scope=col |
* group precedence
+
! scope=col |Math V1
* notation (see below)
+
! scope=col |Math V2
 +
|-
 +
! scope=row |Context
 +
|<math>V < 2</math>
 +
|<math>V \geqslant 2</math>
 +
|-
 +
! scope=row |Machine
 +
|<math>V < 4</math>
 +
|<math>V \geqslant 4</math>
 +
|-
 +
! scope=row |Proof
 +
|<math>V < 1</math>
 +
|<math>V \geqslant 1</math>
 +
|}
  
==== AST requirements ====
 
  
The following hard-coded concepts need to be reified for supporting extensions in the AST. An extension instance will then provide these reified informations to an extended formula class for it to be able to fulfil its API. It is the expression of the missing information for a ''ExtendedExpression'' (resp. ''ExtendedPredicate'') to behave as a ''Expression'' (resp. ''Predicate''). It can also be viewed as the parametrization of the AST.
+
In those XML files, the formulas are written inside various attributes. Existing version conversion tools (in {{package|org.rodinp.core}}) allowed making changes to the structure of the files (adding an element, renaming an attribute, …), through XSL transformations, targetting a single element to apply the conversion to.<br />
 +
But upgrading formulas entails dealing with the contents of attributes in various elements dispatched all over the file. So, extending the conversion tools, a new conversion operation is implemented ({{class|ModifyAttribute}}) that changes the contents of an attribute, together with a new kind of file conversion: 'multiple' conversion, in contrast to 'simple' conversion. This new one allows targetting several elements through unrestricted XPath expressions.<br />
  
===== Notation =====
+
These new core tools extend the existing 'conversions' extension point. It now allows writing conversion extensions in {{package|org.eventb.core}} client package that target every assignment, expression or predicate in any context, machine or proof file and upgrade their contents to the new language version.<br />
  
The notation defines how the formula will be printed. In this document, we use the following convention:
+
The modifying code is delegated to the client that must provide, in the conversion extension, a class that implements {{class|IAttributeModifier}}. It builds the new string value of the attribute from the value read in the file. In our case, it builds a V2 formula that is the upgraded equivalent to the given V1 formula.
* <math>e_i</math> is the i-th child expression of the extended formula
 
* <math>p_i</math> is the i-th child predicate of the extended formula
 
  
''Example'': infix operator "<math>\lozenge</math>"
+
=== Formula upgrade ===
<math>e_1 \lozenge e_2 \lozenge \ldots \lozenge e_n</math>
 
  
We define the following notation framework:
+
==== Formatting ====
  
[[Image:notation_uml.png]]
+
Ideally, upgrading a formula would preserve its previous formatting (spaces, tabs, new lines). But this turns out being quite tricky.
 +
Indeed, it prevents from simply rewriting the formula then applying a {{method|toString()}}, as this would totally forget and override any formatting.
 +
Thus, we would have to write directly into the original string. Firstly find the source locations where changes will occur, then patching the formula with the new expression versions at those source locations.
  
On the "<math>\lozenge</math>" infix operator example, the iterable notation would return sucessively:
+
But that leads to more complexity when we come to consider formulas that contain
* a ''IFormulaChild'' with index 1
+
* recursive relational set operators (that recursively shift source locations)
* the ''INotationSymbol'' "<math>\lozenge</math>"
+
* binary operators with higher priorities nearby (that mindlessly absorb a member of the rewritten expression)
* a ''IFormulaChild'' with index 2
 
* the ''INotationSymbol'' "<math>\lozenge</math>"
 
* &hellip;
 
* a ''IFormulaChild'' with index <math>n</math>
 
  
For the iteration not to run forever, the limit <math>n</math> needs to be known: this is the role of the ''mapsTo()'' method, which fixes the number of children, called when this number is known (i.e. for a particular formula instance).
+
Let's look at the following formula:
  
'''Open question''': how to integrate bound identifier lists in the notation ?
+
<math>f \sube id(A \times id(B))</math>
  
We may make a distinction between fixed-size notations (like n-ary operators for a given n) and variable-size notations (like associative infix operators).
+
A straight rewriting would proceed as follows:
While fixed-size notations would have no specific expressivity limitations (besides parser conflicts with other notations), only a finite set of pre-defined variable-size notation patterns will be proposed to the user. For technical reasons related to the parser,
 
  
The following features need to be implemented to support the notation framework:
+
* <math>f \sube id(A \times B \triangleleft id)</math> which already is false (requires parentheses)
* special meta variables that represent children (<math>e_i</math>, <math>p_i</math>)
+
Then, with the appropriate shift of the source location to replace the first <math>id</math> operator:
* a notation parser that extracts a fixed-size notation pattern from a user-input String
+
* <math>f \sube A \times B \triangleleft id \triangleleft id</math> which is even worse
  
===== Well-Definedness =====
+
This suggests putting parentheses around rewritten expressions. But they would sometimes be superfluous, which brings us to distinguish cases where they are needed and cases where they are not, depending on the priority of the operators nearby.
  
WD predicates also are user-provided data.
+
Considering this is going quite far, a simpler upgrade method is applied:
 +
* if the formula does not use deprecated expressions, nor lacks parentheses around relational set operators, it is left unchanged, preserving the formatting
 +
* it it does, it is rewritten and any previous formatting is overridden
  
''Example'':
+
==== Detecting missing parentheses ====
<math>\mathit{D}(e_1) \land (\forall i \cdot i \in 1\mathit{..}(n-1) \Rightarrow (\mathit{D}(e_i) \Rightarrow \mathit{D}(e_{i+1})))</math>
 
  
In order to process WD predicates, we need to add the following features to the AST:
+
Missing parentheses can be quite easily detected, given a parsed formula and its string.
* the <math>\mathit{D}</math> operator
 
* expression variables (predicate variables already exist)
 
* special expression variables and predicate variables that denote a particular formula child (we need to refer to <math>e_1</math> and <math>e_i</math> in the above example)
 
* a ''parse()'' method that accepts these special meta variables and the <math>\mathit{D}</math> operator and returns a ''Predicate'' (a WD Predicate Pattern)
 
* a ''makeWDPredicate''(aWDPredicatePattern, aPatternInstantiation) method that makes an actual WD predicate
 
  
===== Type Check =====
+
Using a visitor, we check each relational set expression.
  
An extension shall give a type rule, which consists in:
+
Starting from its source location, we search the string for parentheses towards both left and right directions around the expression, skipping space characters.
* type check predicates (addressed in this very section)
+
If the first non space character is not the expected parenthesis, then one is missing and the formula needs upgrading to V2.
* a resulting type expression (only for expressions)
 
  
''Example'':
+
The fussy part is about the definition of space characters.
<math>(\forall i \cdot \mathit{type}(e_i) = \pow(\alpha)) \land (\forall i,j \cdot \mathit{type}(e_i)=\mathit{type}(e_j))</math>
 
  
Type checking can be reified provided the following new AST features:
+
c is an Event-B white space character if one of the following holds:
* the <math>\mathit{type}</math> operator
+
* java.lang.Character.isSpaceChar(c)
* type variables (<math>\alpha</math>)
+
* '\u0009' <= c && c <= '\u000d'
* the above-mentioned expression variables and predicate variables
+
* '\u001c' <= c && c <= '\u001f'
* a ''parse()'' method that accepts these special meta variables and the <math>\mathit{type}</math> operator and returns a ''Predicate'' (a Type Predicate Pattern)
 
* a ''makeTypePredicate''(aTypePredicatePattern, aPatternInstantiation) method that makes an actual Type predicate
 
  
===== Type Solve =====
+
A new {{method|isEventBWhiteSpace(char)}} method has been provided in {{class|FormulaFactory}} to prevent developers from worrying about the details of this matter.
  
This section addresses type synthesizing for extended expressions (the resulting type part of a type rule). It is given as a type expression pattern, so that the actual type can be computed from the children.
+
==== Typing generic operators ====
  
''Example'':
+
The problem of typing generic operators arises when upgrading a fomula like <math>\operatorname{id(x)}</math>.
  <math>\pow(\mathit{type}(e_1))</math>
+
We know that, in the general case, the upgraded version is <math>x \domres \id</math>.
 +
But which type should the generic <math>\id</math> bear ?
  
In addition to the requirements for Type Check, the following features are needed:
+
An answer is:
* a ''parse()'' method that accepts special meta variables and the <math>\mathit{type}</math> operator and returns an ''Expression'' (a Type Expression Pattern)
+
If the V1 operator is typed, then the V2 generic one should bear the same type.
* a ''makeTypeExpression''(aTypeExpressionPattern, aPatternInstantiation) method that makes an actual Type expression
 
  
==== Static Checker requirements ====
+
Given the convention that <math>\operatorname{o}_T</math> means that <math>\operatorname{o}</math> has type <math>T</math>, this entails rewriting <math>\id_{T}(x)</math> into <math>x \domres \id_{T}</math>.
{{TODO}}
 
==== Proof Obligation Generator requirements ====
 
{{TODO}}
 
==== Provers requirements ====
 
{{TODO}}
 
  
=== Extension compatibility issues ===
+
But then, what happens if <math>x</math> is a type expression,like <math>\mathbb{Z}</math> ?
{{TODO}}
 
  
== User Input Summarization ==
+
We would come to rewriting <math>\id_{\mathbb{P}(\mathbb{Z}\times\mathbb{Z})}(\mathbb{Z})</math> into <math>\mathbb{Z} \domres \id_{\mathbb{P}(\mathbb{Z}\times\mathbb{Z})}</math>.
  
Identified required data entail the following user input:
+
In this case, the domain restriction is clearly superfluous. This brings in a new rule:
 +
If the V1 unary operator has a type expression for argument, then the V2 generic atomic one needs no domain restriction
  
{{TODO}}
+
Putting things together, the following table summarizes the upgrade of generic operators, based on the <math>\id</math> operator example. Similar rules apply to <math>\prjone</math> and <math>\prjtwo</math>.
 +
{| cellspacing="10" cellpadding="10" style="text-align:center;"
 +
! scope=col |
 +
! scope=col |Math V1
 +
! scope=col |Math V2
 +
|-
 +
! scope=row |Type expression
 +
|<math>\id_{\mathbb{P}(\mathbb{Z}\times\mathbb{Z})}(\mathbb{Z})</math>
 +
|<math>\id_{\mathbb{P}(\mathbb{Z}\times\mathbb{Z})}</math>
 +
|-
 +
! scope=row |Typed
 +
|<math>\id_{T}(x)</math>
 +
|<math>x \domres \id_{T}</math>
 +
|-
 +
! scope=row |Not typed
 +
|<math>\id(x)</math>
 +
|<math>x \domres \id</math>
 +
|}
  
== Impact on other tools ==
+
==== Not upgradable cases ====
  
Impacted plug-ins (use a factory to build formulæ):
+
{{TODO}}
* <tt>org.eventb.core</tt>
 
: In particular, the static checker and proof obligation generator are impacted.
 
* <tt>org.eventb.core.seqprover</tt>
 
* <tt>org.eventb.pp</tt>
 
* <tt>org.eventb.pptrans</tt>
 
* <tt>org.eventb.ui</tt>
 
  
== Identified Problems ==
+
== Sequent Prover ==
The parser shall enforce verifications to detect the following situations:
+
{{TODO}}
* Two mathematical extensions are not compatible (the extensions define symbols with the same name but with a different semantics).
 
* A mathematical extension is added to a model and there is a conflict between a symbol and an identifier.
 
* An identifier which conflicts with a symbol of a visible mathematical extension is added to a model.
 
 
 
Beyond that, the following situations are problematic:
 
* A formula has been written with a given parser configuration and is read with another parser configuration.
 
: As a consequence, it appears as necessary to remember the parser configuration.
 
: The static checker will then have a way to invalid the sources of conflicts (e.g., priority of operators, etc).
 
:: ''The static checker will then have a way to invalid the formula upon detecting a conflict (name clash, associativity change, semantic change...) [[User:Mathieu|mathieu]]
 
 
 
* A proof may free a quantified expression which is in conflict with a mathematical extension.
 
: SOLUTION #1: Renaming the conflicting identifiers in proofs?
 
 
 
== Open Questions ==
 
 
 
=== New types ===
 
Which option should we prefer for new types?
 
* OPTION #1: Transparent mode.
 
:In transparent mode, it is always referred to the base type. As a consequence, the type conversion is implicitly supported (weak typing).
 
:For example, it is possible to define the <tt>DISTANCE</tt> and <tt>SPEED</tt> types, which are both derived from the <math>\intg</math> base type, and to multiply a value of the former type with a value of the latter type.
 
 
 
* OPTION #2: Opaque mode.
 
:In opaque mode, it is never referred to the base type. As a consequence, values of one type cannot be converted to another type (strong typing).
 
:Thus, the above multiplication is not allowed.
 
:This approach has at least two advantages:
 
:* Stronger type checking.
 
:* Better prover performances.
 
:It also has some disadvantages:
 
:* need of ''extractors'' to convert back to base types.
 
:* need of extra circuitry to allow things like <math>x:=d*2</math> where <math>x, d</math> are of type <tt>DISTANCE</tt>
 
 
 
* OPTION #3: Mixed mode.
 
:In mixed mode, the transparent mode is applied to scalar types and the opaque mode is applied to other types.
 
 
 
=== Scope of the mathematical extensions ===
 
* OPTION #1: Project scope.
 
:The mathematical extensions are implicitly visible to all components of the project that has imported them.
 
* OPTION #2: Component scope.
 
:The mathematical extensions are only visible to the components that have explicitly imported them. However, note that this visibility is propagated through the hierarchy of contexts and machines (<tt>EXTENDS</tt>, <tt>SEES</tt> and <tt>REFINES</tt> clauses).
 
:An issue has been identified. Suppose that <tt>ext1</tt> extension is visible to component <tt>C1</tt> and <tt>ext2</tt> is visible to component <tt>C2</tt>, and there is no compatibility issue between <tt>ext1</tt> and <tt>ext2</tt>. It is not excluded that an identifier declared in <tt>C1</tt> conflict with a symbol in <tt>ext2</tt>. As a consequence, a global verification is required when adding a new mathematical extension.
 
  
== Bibliography ==
 
* J.R. Abrial, M.Butler, M.Schmalz, S.Hallerstede, L.Voisin, [http://deploy-eprints.ecs.soton.ac.uk/80 ''Proposals for Mathematical Extensions for Event-B''], 2009.
 
:This proposal consists in considering three kinds of extension:
 
# Extensions of set-theoretic expressions or predicates: example extensions of this kind consist in adding the transitive closure of relations or various ordered relations.
 
# Extensions of the library of theorems for predicates and operators.
 
# Extensions of the Set Theory itself through the definition of algebraic types such as  lists or ordered trees using new set constructors.
 
  
[[Category:Design proposal]]
+
[[Category:Event-B]]
 +
[[Category:Design]]

Revision as of 10:10, 20 April 2009

Introduction

This document describes the modifications that occur to the Event-B mathematical language from the developer point of view. The old language will still be recognized, in order to provide backward compatibility.

These changes are described from a user point of view in Changes to the Mathematical Language of Event-B.

Abstract Syntax Tree

Deprecated nodes (becoming atomic expressions):

  • KPRJ1
  • KPRJ2
  • KID

New nodes:

  • KPARTITION that represents the new partition predicate (multiple predicate)
  • KPRJ1_GEN that represents the generic atomic version of the first projection
  • KPRJ2_GEN that represents the generic atomic version of the second projection
  • KID_GEN that represents the generic atomic version of the identity

Parser Changes

The new parser has a version attribute. Depending on the version, formulas are parsed differently. Thus, the KPARTITION node is a regular identifier if version is V1. It is accepted as a keyword only from version V2. Concerning identity and projections, they are recognized as non-generic with V1 and generic with V2. The following table summarizes how tokens are processed depending on the language version:


Token Recognition
Language V1 Language V2
"partition" IDENT KPARTITION
"id" KID KID_GEN
"prj1" KPRJ1 KPRJ1_GEN
"prj2" KPRJ2 KPRJ2_GEN

Parsing

FormulaFactory has new parse() methods with a language version argument. Existing methods default to the first version (LanguageVersion.V1), while becoming deprecated. Thus, one now has to know, before parsing a formula, the version of the language it has been written with. It also provides methods to upgrade to the new language version.

Multiple Predicates / Partition

A new predicate class is introduced: MultiplePredicate, in which the partition predicate is implemented. A MultiplePredicate is a predicate that applies to a list of one or more expressions. In a partition, all expressions must have the same type, that is the type of the partitioned set (first child).

Associativity

Newly non associative relational set operators now require parentheses around their operands for parsing to succeed. This is managed directly by the parser. Unparsing also requires those parentheses, so the BinaryExpression.toString() method has been changed accordingly.

Visitors

Partition, identity and projections also entail new visitor methods, as follows:

IVisitor has 6 more methods:

  • enterKPARTITION(MultiplePredicate)
  • continueKPARTITION(MultiplePredicate)
  • exitKPARTITION(MultiplePredicate)
  • visitKID_GEN(AtomicExpression)
  • visitKPRJ1_GEN(AtomicExpression)
  • visitKPRJ2_GEN(AtomicExpression)


ISimpleVisitor has 1 more method:

  • visitMultiplePredicate(MultiplePredicate)

Thus, implementors of those interfaces are required to implemented new methods, while subclassers of DefaultVisitor and DefaultSimpleVisitor should consider whether or not to override them.


Upgrade

Formulas written with version V1 will need upgrading to version V2. However, most of the V1 formulas will be exactly the same with V2. A formula needs ugrading if:

  • it contains one of {KPRJ1, KPRJ2, KID}
  • it contains a non parenthesised sequence of relational set operators

In these cases an upgrade procedure will be applied to the formula using a IFormulaRewriter, to rewrite KPRJ1, KPRJ2, KID expressions into their generic equivalent (see Changes to the Mathematical Language of Event-B#Generic Identity and Projections).

Upgraded formulas can be obtained from a formula string by calling one of the new FormulaFactory.upgrade() methods.

File upgrade

Existing rodin files with V1 formulas inside need converting to V2 formulas. Concerned files are:

  • Contexts
  • Machines
  • Proofs

This is managed through the version tools of the rodin core package: a new version number for each of these files represents the passing from V1 to V2 mathematical language. The following table summarizes the correspondence between file version number and mathematical version number:


File / Math versions correspondence
Math V1 Math V2
Context V < 2 V \geqslant 2
Machine V < 4 V \geqslant 4
Proof V < 1 V \geqslant 1


In those XML files, the formulas are written inside various attributes. Existing version conversion tools (in

org.rodinp.core

) allowed making changes to the structure of the files (adding an element, renaming an attribute, …), through XSL transformations, targetting a single element to apply the conversion to.

But upgrading formulas entails dealing with the contents of attributes in various elements dispatched all over the file. So, extending the conversion tools, a new conversion operation is implemented (ModifyAttribute) that changes the contents of an attribute, together with a new kind of file conversion: 'multiple' conversion, in contrast to 'simple' conversion. This new one allows targetting several elements through unrestricted XPath expressions.

These new core tools extend the existing 'conversions' extension point. It now allows writing conversion extensions in

org.eventb.core

client package that target every assignment, expression or predicate in any context, machine or proof file and upgrade their contents to the new language version.

The modifying code is delegated to the client that must provide, in the conversion extension, a class that implements IAttributeModifier. It builds the new string value of the attribute from the value read in the file. In our case, it builds a V2 formula that is the upgraded equivalent to the given V1 formula.

Formula upgrade

Formatting

Ideally, upgrading a formula would preserve its previous formatting (spaces, tabs, new lines). But this turns out being quite tricky. Indeed, it prevents from simply rewriting the formula then applying a toString(), as this would totally forget and override any formatting. Thus, we would have to write directly into the original string. Firstly find the source locations where changes will occur, then patching the formula with the new expression versions at those source locations.

But that leads to more complexity when we come to consider formulas that contain

  • recursive relational set operators (that recursively shift source locations)
  • binary operators with higher priorities nearby (that mindlessly absorb a member of the rewritten expression)

Let's look at the following formula:

f \sube id(A \times id(B))

A straight rewriting would proceed as follows:

  • f \sube id(A \times B \triangleleft id) which already is false (requires parentheses)

Then, with the appropriate shift of the source location to replace the first id operator:

  • f \sube A \times B \triangleleft id \triangleleft id which is even worse

This suggests putting parentheses around rewritten expressions. But they would sometimes be superfluous, which brings us to distinguish cases where they are needed and cases where they are not, depending on the priority of the operators nearby.

Considering this is going quite far, a simpler upgrade method is applied:

  • if the formula does not use deprecated expressions, nor lacks parentheses around relational set operators, it is left unchanged, preserving the formatting
  • it it does, it is rewritten and any previous formatting is overridden

Detecting missing parentheses

Missing parentheses can be quite easily detected, given a parsed formula and its string.

Using a visitor, we check each relational set expression.

Starting from its source location, we search the string for parentheses towards both left and right directions around the expression, skipping space characters. If the first non space character is not the expected parenthesis, then one is missing and the formula needs upgrading to V2.

The fussy part is about the definition of space characters.

c is an Event-B white space character if one of the following holds:
* java.lang.Character.isSpaceChar(c)
* '\u0009' <= c && c <= '\u000d'
* '\u001c' <= c && c <= '\u001f'

A new isEventBWhiteSpace(char) method has been provided in FormulaFactory to prevent developers from worrying about the details of this matter.

Typing generic operators

The problem of typing generic operators arises when upgrading a fomula like \operatorname{id(x)}. We know that, in the general case, the upgraded version is x \domres \id. But which type should the generic \id bear ?

An answer is:

If the V1 operator is typed, then the V2 generic one should bear the same type.

Given the convention that \operatorname{o}_T means that \operatorname{o} has type T, this entails rewriting \id_{T}(x) into x \domres \id_{T}.

But then, what happens if x is a type expression,like \mathbb{Z} ?

We would come to rewriting \id_{\mathbb{P}(\mathbb{Z}\times\mathbb{Z})}(\mathbb{Z}) into \mathbb{Z} \domres \id_{\mathbb{P}(\mathbb{Z}\times\mathbb{Z})}.

In this case, the domain restriction is clearly superfluous. This brings in a new rule:

If the V1 unary operator has a type expression for argument, then the V2 generic atomic one needs no domain restriction

Putting things together, the following table summarizes the upgrade of generic operators, based on the \id operator example. Similar rules apply to \prjone and \prjtwo.

Math V1 Math V2
Type expression \id_{\mathbb{P}(\mathbb{Z}\times\mathbb{Z})}(\mathbb{Z}) \id_{\mathbb{P}(\mathbb{Z}\times\mathbb{Z})}
Typed \id_{T}(x) x \domres \id_{T}
Not typed \id(x) x \domres \id

Not upgradable cases

TODO

Sequent Prover

TODO