Difference between pages "D45 Scalability" and "Mathematical Extensions"

From Event-B
(Difference between pages)
Jump to navigationJump to search
imported>Tommy
m
 
imported>Nicolas
 
Line 1: Line 1:
= Overview =
+
Currently the operators and basic predicates of the Event-B mathematical language supported by Rodin are fixed.
* '''Improved Performance''' According to the refocus mentioned in the [[D45_General_Platform_Maintenance#Motivations| General Platform Maintenance]] chapter, much of the reworking efforts performed on the core platform basically aimed to overcome Rodin scalability weaknesses, and the continuous need of seamless proving experience. The whole performance was enhanced by core implementation and user interface refactorings. Improvements made to the proving experience will be detailed in a separate chapter.
+
We propose to extend Rodin to define basic predicates, new operators or new algebraic types.
* '''Design pattern management / Generic Instantiation''' {{TODO}} An overview of the contribution about the design pattern management / Generic Instantiation (Thai Son Hoang)
 
* '''Edition''' It appears along DEPLOY - the models defined by pilots becoming industrial sized due to the level of complexity reached - that the edition became a central concern.
 
:*The legacy structured editor based on a form edition specific architecture reached its limits in editing such complex models. Industrial partners found this issue significant regarding the adoption of the Rodin platform in industry and providing a new structured editor correcting this issue became a important task during this last year of the DEPLOY project.
 
:*Camille textual editor had to tackle challenges related to resources mapping and management for projects of industrial size mentioned above or even the design of extension capabilities, that became a major concern since the grammar could be extended with theories.
 
  
= Motivations =
+
== Requirements ==
  
== Improved Performance ==
+
=== User Requirements ===
Some of the several major performance issues faced were related to the core code of the Rodin platform, causing crashes, loss of data, corruption in models. Some other were related to the UI causing platform hanging, and sometimes leading to its freezing which required sometimes to kill the Rodin process, thus also leading to potential loss of data and corruption in models. It was necessary to solve such issues before the end of DEPLOY.
+
* Binary operators (prefix form, infix form or suffix form).
 +
* Operators on boolean expressions.
 +
* Unary operators, such as absolute values.
 +
: Note that the pipe, which is already used for set comprehension, cannot be used to enter absolute values (''in fact, as in the new design the pipe used in set comprehension is only a syntaxic sugar, the same symbol may be used for absolute value. To be confirmed with the prototyped parser. It may however be not allowed in a first time until backtracking is implemented, due to use of lookahead making assumptions on how the pipe symbol is used. -Mathieu ).
 +
* Basic predicates (e.g., the symmetry of relations <math>sym(R) \defi R=R^{-1}</math>).
 +
: Having a way to enter such predicates may be considered as syntactic sugar, because it is already possible to use sets (e.g., <math>R \in sym</math>, where <math>sym \defi \{R \mid R=R ^{-1}\}</math>) or functions (e.g., <math>sym(R) = \True</math>, where <math>sym \defi (\lambda R \qdot R \in A \rel B \mid \bool(R = R^{-1}))</math>).
 +
* Quantified expressions (e.g., <math>\sum x \qdot P \mid y</math>, <math>\prod x \qdot P \mid y</math>, <math>~\min(S)</math>, <math>~\max(S)</math>).
 +
* Types.
 +
** Enumerated types.
 +
** Scalar types.
  
== Design Pattern Management / Generic Instantiation  ==
+
=== User Input ===
{{TODO}} ''To be completed by Thai Son Hoang''
+
The end-user shall provide the following information:
== Edition ==
+
* keyboard input
Two major kinds of failures were faced when developing industrial sized models:
+
* Lexicon and Syntax. <br/>More precisely, it includes the symbols, the form (prefix, infix, postfix), the grammar, associativity (left-associative or right associative), commutativity, priority, the mode (flattened or not), ...
:* Failures occurring on windows platforms, the mostly used system, when reaching the operating system limit number of graphical elements in memory allowed to be displayed at once. This bug used to crash the Rodin platform, and it appeared necessary to control the number of graphical elements needed to display the model elements. Architectures sparing these graphical elements should be thus favoured.
+
* Pretty-print. <br/>Alternatively, the rendering may be determined from the notation parameters passed to the parser.
:* Failures due to the high consumption of memory by the heavy graphical elements used by the forms of legacy editor.
+
* Typing rules.
:Moreover, it appeared important to visualize some elements that are not part of the current level of modelling, but are linked to it, for example, the actions in an abstract event, or its guards. Such elements are called the "inherited" elements, or "implicit children". The legacy structured editor being directly interfaced with the underlying Event-B models in database, it was difficult and tricky to modify it in order to display inherited elements. This was one more argument to go for a new editor based on a intermediary representation of the models.
+
* Well-definedness.
:Another motivation to go for another editor was to get a more modest and ergonomic way to visualize and edit the models. The models being presented through styled text pretty printing and the edition tightly derived from a textual edition.
 
  
* The motivations about Camille evolution {{TODO}} Ingo Weigelt.
+
=== Development Requirements ===
 +
* Scalability.
  
= Choices / Decisions =
+
== Towards a generic AST ==
== Improved Performance ==
 
SYSTEREL lead a two phase investigation to have a better idea of the work to be done. Each phase being followed by some refactoring of the code. Out of the early investigation, a cause and effect relationship has been found between performance loss and the various reported bugs, such as "platform hanging" bugs. Indeed, it appeared that solving the performance issues sometimes solved induced bugs as well which made the scalability improvement tasks encompass the maintenance goals.<br>
 
Later, a deeper investigation was performed where profiling and code review were the two techniques used to tackle the issues left. The profiling strategy allowed to get a better localisation of the performance loss in both UI and core code while the code review helped to understand the intrinsic misuses or drawbacks of particular components and/or architectures.
 
  
== Design Pattern Management / Generic Instantiation  ==
+
The following AST parts are to become generic, or at least parameterised:
{{TODO}} ''To be completed by Thai Son Hoang''
+
* [[Constrained_Dynamic_Lexer | Lexer]]
== Edition ==
+
* [[Constrained Dynamic Parser | Parser]]
* Rodin Editor
+
* Nodes ( Formula class hierarchy ): parameters needed for:
The legacy editor drawbacks mainly come from its greedy components and structure but a structured way to edit the models (based on a database storage) still seems particularly appropriate. This mainly explains why a new structured editor, the Rodin Editor, has been built on the basis of the improvements made to the Proving UI. In fact, the textual component used, the SWT StyledText, allowed to describe every model element as text, as would a pretty print do. The edition is made possible through the use of a unique additional graphical component that overlays the presentation of the element to edit.  It lightened the presentation, but also side stepped the use of a significant amount of greedy graphical because at most two of them are needed: the model viewer and its overlay editor!<br>
+
** Type Solve (type rule needed to synthesize the type)
The first public version (0.5.0) of the Rodin Editor as been released on the 13/07/2011 as a plug-in of the Rodin platform. This decision has been made in order to let the plug-in incubation for the time it is being tested and stated that no regression is introduced. The Rodin Editor since its version xx is part of the Rodin Core Platform.
+
** Type Check (type rule needed to verify constraints on children types)
* Camille
+
** WD (WD predicate)
{{TODO}} Ingo Weigelt
+
** PrettyPrint (tag image + notation (prefix, infix, postfix) + needs parentheses)
 +
** Visit Formula (getting children + visitor callback mechanism)
 +
** Rewrite Formula (associative formulæ have a specific flattening treatment)
 +
* Types (Type class hierarchy): parameters needed for:
 +
** Building the type expression (type rule needed)
 +
** PrettyPrint (set operator image)
 +
** getting Base / Source / Target type (type rule needed)
 +
* Verification of preconditions (see for example <tt>AssociativeExpression.checkPreconditions</tt>)
  
= Available Documentation =
+
=== Vocabulary ===
* {{TODO}} Links for Improved Performance
 
* {{TODO}} Links for Design Pattern Management / Generic Instantiation
 
* Links for Edition
 
:Two documentation pages have been created for the Rodin Editor:
 
:* A general page describing the editor and how it works <ref name="RodinEditorPage">http://wiki.event-b.org/index.php/Rodin_Editor</ref>,
 
:* A user guide page <ref name="RodinEditorUserManual">http://wiki.event-b.org/index.php/Rodin_Editor_User_Guide</ref>,
 
: Moreover, a special category has been created on both SourceForge feature request<ref name="featTracker">https://sourceforge.net/tracker/?group_id=108850&atid=651672</ref> and bug<ref name="bugTracker">https://sourceforge.net/tracker/?group_id=108850&atid=651669</ref> trackers.
 
  
:{{TODO}} '''Ingo''' Camille
+
An '''extension''' is to be understood as a single additional operator definition.
  
= Status =
+
=== Tags ===
== Improved Performance ==
 
The refactoring made on both core code and UI code allowed to gain up to 25 times speed-up on the UI, and almost a 2 times speed-up in the core code, making the platform usable in an industrial sized context.<br>
 
  
== Design Pattern Management / Generic Instantiation  ==
+
Every extension is associated with an integer tag, just like existing operators. Thus, questions arise about how to allocate new tags and how to deal with existing tags.<br />
{{TODO}} ''To be completed by Thai Son Hoang''
+
The solution proposed here consists in keeping existing tags 'as is'. They are already defined and hard coded in the ''Formula'' class, so this choice is made with backward compatibility in mind.
== Edition ==
 
The Rodin Editor is part of the Rodin plateform since. As the time of writing this document, the current version is xx.
 
  
= References =
+
Now concerning extension tags, we will first introduce a few hypotheses:
<references/>
+
* Tags_Hyp1: tags are never persisted across sessions on a given platform
 +
* Tags_Hyp2: tags are never referenced for sharing purposes across various platforms
 +
In other words, cross-platform/session formula references are always made through their ''String'' representation. These assumptions, which were already made and verified for Rodin before extensions, lead us to restrict further considerations to the scope of a single session on a single platform.
  
[[Category:D45 Deliverable]]
+
The following definitions hold at a given instant <math>t</math> and for the whole platform.<br />
 +
Let <math>\mathit{EXTENSION}_t</math> be the set of extensions supported by the platform at instant <math>t</math>;<br /> let <math>\mathit{tag}_t</math> denote the affectation of tags to a extensions at instant <math>t</math> (<math>\mathit{tag}_t \in \mathit{EXTENSION}_t \pfun \intg</math>);<br /> let <math>\mathit{COMMON}</math> be the set of existing tags defined by the ''Formula'' class (<math>\mathit{COMMON} \sub \intg</math>).<br /> The following requirements emerge:
 +
* Tags_Req1: <math>\forall t \qdot \mathit{tag}_t \in \mathit{EXTENSION}_t \tinj \intg</math>
 +
* Tags_Req2: <math>\forall e, t_1,t_2 \qdot \mathit{tag}_{t_1}(e)=\mathit{tag}_{t_2}(e)</math> where <math>t_1, t_2</math> are two instants during a given session
 +
* Tags_Req3: <math>\forall t \qdot \ran(\mathit{tag}_t) \cap \mathit{COMMON} = \empty</math>
 +
 
 +
The above-mentioned scope-restricting hypothesis can be reformulated into: <math>\mathit{tag}</math> needs not be stable across sessions nor across platforms.
 +
 
 +
=== Formula Factory ===
 +
 
 +
The requirements about tags give rise to a need for centralising the <math>\mathit{tag}</math> relation in order to enforce tag uniqueness.
 +
The Formula Factory appears to be a convenient and logical candidate for playing this role. Each time an extension is used to make a formula, the factory is called and it can check whether its associated tag exists, create it if needed, then return the new extended formula while maintaining tag consistency.
 +
 
 +
The factory can also provide API for requests about tags and extensions: getting the tag from an extension and conversely.
 +
 
 +
We also need additional methods to create extended formulæ. A first problem to address is: which type should these methods return ?
 +
We could define as many extended types as the common AST API does, namely ''ExtendedUnaryPredicate'', ''ExtendedAssociativeExpression'', and so on, but this would lead to a large number of new types to deal with (in visitors, filters, …), together with a constraint about which types extensions would be forced to fit into. It is thus preferable to have as few extended types as possible, but with as much parameterisation as can be. Considering that the two basic types ''Expression'' and ''Predicate'' have to be extensible, we come to add two extended types ''ExtendedExpression'' and ''ExtendedPredicate''.
 +
 
 +
ExtendedExpression makeExtendedExpression( ? )
 +
ExtendedPredicate makeExtendedPredicate( ? )
 +
 
 +
Second problem to address: which arguments should these methods take ?
 +
Other factory methods take the tag, a collection of children where applicable, and a source location. In order to discuss what can be passed as argument to make extended formulæ, we have to recall that the make… factory methods have various kinds of clients, namely:
 +
* parser
 +
* POG
 +
* provers
 +
(other factory clients use the parsing or identifier utility methods: SC modules, indexers, …)
 +
 
 +
Thus, the arguments should be convenient for clients, depending on which information they have at hand.
 +
The source location does not obviously seem to require any adaptation and can be taken as argument the same way. Concerning the tag, it depends on whether clients have a tag or an extension at hand. Both are intended to be easily retrieved from the factory. As a preliminary choice, we can go for the tag and adjust this decision when we know more about client convenience.
 +
 
 +
As for children, the problem is more about their types. We want to be able to handle as many children as needed, and of all possible types. Looking to existing formula children configurations, we can find:
 +
* expressions with predicate children: <math>\mathit{bool}(P)</math>
 +
* expressions with expression children: <math>E_1 + E_2</math>
 +
* predicates with predicate children: <math>P_1 \limp P_2</math>
 +
* predicates with expression children: <math>\mathit{partition}(S, E_1, E_2)</math>
 +
* mixed operators: <math>\{x \qdot P(x) \mid E(x)\}</math>, but it is worth noting that the possibility of introducing bound variables in extended formulæ is not established yet.
 +
Thus, for the sake of generality, children of both types should be supported for both extended predicates and extended expressions.
 +
 
 +
ExtendedExpression makeExtendedExpression(int tag, Expression[] expressions, Predicate[] predicates, SourceLocation location)
 +
ExtendedPredicate makeExtendedPredicate(int tag, Expression[] expressions, Predicate[] predicates, SourceLocation location)
 +
 
 +
 
 +
=== Defining Extensions ===
 +
 
 +
An extension is meant to contain every information and behaviour required by:
 +
* Keyboard
 +
* (Extensible Fonts ?)
 +
* Lexer
 +
* Parser
 +
* AST
 +
* Static Checker
 +
* Proof Obligation Generator
 +
* Provers
 +
 
 +
==== Keyboard requirements ====
 +
 
 +
'''Kbd_req1''': an extension must provide an association combo/translation for every uncommon symbol involved in the notation.
 +
 
 +
==== Lexer requirements ====
 +
 
 +
'''Lex_req1''': an extension must provide an association lexeme/token for every uncommon symbol involved in the notation.
 +
 
 +
==== Parser requirements ====
 +
 
 +
According to [[Constrained_Dynamic_Parser| the Parser page]], the following informations are required by the parser in order to be able to parse a formula containing a given extension.
 +
 
 +
* symbol compatibility
 +
* group compatibility
 +
* symbol precedence
 +
* group precedence
 +
{{TODO|more}}
 +
 
 +
==== AST requirements ====
 +
 
 +
The following hard-coded concepts need to be reified for supporting extensions in the AST. An extension instance will then provide these reified informations to an extended formula class for it to be able to fulfil its API. It is the expression of the missing information for a ''ExtendedExpression'' (resp. ''ExtendedPredicate'') to behave as a ''Expression'' (resp. ''Predicate''). It can also be viewed as the parametrization of the AST.
 +
 
 +
===== Notation =====
 +
 
 +
The notation defines how the formula will be printed. In this document, we use the following convention:
 +
* <math>e_i</math> is the i-th child expression of the extended formula
 +
* <math>p_i</math> is the i-th child predicate of the extended formula
 +
 
 +
''Example'': infix operator "<math>\lozenge</math>"
 +
<math>e_1 \lozenge e_2 \lozenge \ldots \lozenge e_n</math>
 +
 
 +
We define the following notation framework:
 +
 
 +
[[Image:notation_uml.png]]
 +
 
 +
On the "<math>\lozenge</math>" infix operator example, the iterable notation would return sucessively:
 +
* a ''IFormulaChild'' with index 1
 +
* the ''INotationSymbol'' "<math>\lozenge</math>"
 +
* a ''IFormulaChild'' with index 2
 +
* the ''INotationSymbol'' "<math>\lozenge</math>"
 +
* &hellip;
 +
* a ''IFormulaChild'' with index <math>n</math>
 +
 
 +
For the iteration not to run forever, the limit <math>n</math> needs to be known: this is the role of the ''mapsTo()'' method, which fixes the number of children, called when this number is known (i.e. for a particular formula instance).
 +
 
 +
Open question: how to integrate bound identifier lists in the notation ?
 +
 
 +
===== Well-Definedness =====
 +
 
 +
WD predicates also are user-provided data.
 +
 
 +
''Example'':
 +
<math>\mathit{D}(e_1) \land (\forall i \cdot i \in 1\mathit{..}(n-1) \Rightarrow (\mathit{D}(e_i) \Rightarrow \mathit{D}(e_{i+1})))</math>
 +
 
 +
In order to process WD predicates, we need to add the following features to the AST:
 +
* the <math>\mathit{D}</math> operator
 +
* expression variables (predicate variables already exist)
 +
* special expression variables and predicate variables that denote a particular formula child (we need to refer to <math>e_1</math> and <math>e_i</math> in the above example)
 +
* a ''parse()'' method that accepts these special meta variables and the <math>\mathit{D}</math> operator and returns a ''Predicate'' (a WD Predicate Pattern)
 +
* a ''makeWDPredicate''(aWDPredicatePattern, aPatternInstantiation) method that makes an actual WD predicate
 +
 
 +
===== Type Check =====
 +
 
 +
An extension shall give a type rule, which consists in:
 +
* type check predicates (addressed in this very section)
 +
* a resulting type expression (only for expressions)
 +
 
 +
''Example'':
 +
<math>(\forall i \cdot \mathit{type}(e_i) = \pow(\alpha)) \land (\forall i,j \cdot \mathit{type}(e_i)=\mathit{type}(e_j))</math>
 +
 
 +
Type checking can be reified provided the following new AST features:
 +
* the <math>\mathit{type}</math> operator
 +
* type variables (<math>\alpha</math>)
 +
* the above-mentioned expression variables and predicate variables
 +
* a ''parse()'' method that accepts these special meta variables and the <math>\mathit{type}</math> operator and returns a ''Predicate'' (a Type Predicate Pattern)
 +
* a ''makeTypePredicate''(aTypePredicatePattern, aPatternInstantiation) method that makes an actual Type predicate
 +
 
 +
===== Type Solve =====
 +
 
 +
This section addresses type synthesizing for extended expressions (the resulting type part of a type rule). It is given as a type expression pattern, so that the actual type can be computed from the children.
 +
 
 +
''Example'':
 +
  <math>\pow(\mathit{type}(e_1))</math>
 +
 
 +
In addition to the requirements for Type Check, the following features are needed:
 +
* a ''parse()'' method that accepts special meta variables and the <math>\mathit{type}</math> operator and returns an ''Expression'' (a Type Expression Pattern)
 +
* a ''makeTypeExpression''(aTypeExpressionPattern, aPatternInstantiation) method that makes an actual Type expression
 +
 
 +
==== Static Checker requirements ====
 +
{{TODO}}
 +
==== Proof Obligation Generator requirements ====
 +
{{TODO}}
 +
==== Provers requirements ====
 +
{{TODO}}
 +
 
 +
=== Extension compatibility issues ===
 +
{{TODO}}
 +
 
 +
== User Input Summarization ==
 +
 
 +
Identified required data entail the following user input:
 +
 
 +
{{TODO}}
 +
 
 +
== Impact on other tools ==
 +
 
 +
Impacted plug-ins (use a factory to build formulæ):
 +
* <tt>org.eventb.core</tt>
 +
: In particular, the static checker and proof obligation generator are impacted.
 +
* <tt>org.eventb.core.seqprover</tt>
 +
* <tt>org.eventb.pp</tt>
 +
* <tt>org.eventb.pptrans</tt>
 +
* <tt>org.eventb.ui</tt>
 +
 
 +
== Identified Problems ==
 +
The parser shall enforce verifications to detect the following situations:
 +
* Two mathematical extensions are not compatible (the extensions define symbols with the same name but with a different semantics).
 +
* A mathematical extension is added to a model and there is a conflict between a symbol and an identifier.
 +
* An identifier which conflicts with a symbol of a visible mathematical extension is added to a model.
 +
 
 +
Beyond that, the following situations are problematic:
 +
* A formula has been written with a given parser configuration and is read with another parser configuration.
 +
: As a consequence, it appears as necessary to remember the parser configuration.
 +
: The static checker will then have a way to invalid the sources of conflicts (e.g., priority of operators, etc).
 +
:: ''The static checker will then have a way to invalid the formula upon detecting a conflict (name clash, associativity change, semantic change...) [[User:Mathieu|mathieu]]
 +
 
 +
* A proof may free a quantified expression which is in conflict with a mathematical extension.
 +
: SOLUTION #1: Renaming the conflicting identifiers in proofs?
 +
 
 +
== Open Questions ==
 +
 
 +
=== New types ===
 +
Which option should we prefer for new types?
 +
* OPTION #1: Transparent mode.
 +
:In transparent mode, it is always referred to the base type. As a consequence, the type conversion is implicitly supported (weak typing).
 +
:For example, it is possible to define the <tt>DISTANCE</tt> and <tt>SPEED</tt> types, which are both derived from the <math>\intg</math> base type, and to multiply a value of the former type with a value of the latter type.
 +
 
 +
* OPTION #2: Opaque mode.
 +
:In opaque mode, it is never referred to the base type. As a consequence, values of one type cannot be converted to another type (strong typing).
 +
:Thus, the above multiplication is not allowed.
 +
:This approach has at least two advantages:
 +
:* Stronger type checking.
 +
:* Better prover performances.
 +
:It also has some disadvantages:
 +
:* need of ''extractors'' to convert back to base types.
 +
:* need of extra circuitry to allow things like <math>x:=d*2</math> where <math>x, d</math> are of type <tt>DISTANCE</tt>
 +
 
 +
* OPTION #3: Mixed mode.
 +
:In mixed mode, the transparent mode is applied to scalar types and the opaque mode is applied to other types.
 +
 
 +
=== Scope of the mathematical extensions ===
 +
* OPTION #1: Project scope.
 +
:The mathematical extensions are implicitly visible to all components of the project that has imported them.
 +
* OPTION #2: Component scope.
 +
:The mathematical extensions are only visible to the components that have explicitly imported them. However, note that this visibility is propagated through the hierarchy of contexts and machines (<tt>EXTENDS</tt>, <tt>SEES</tt> and <tt>REFINES</tt> clauses).
 +
:An issue has been identified. Suppose that <tt>ext1</tt> extension is visible to component <tt>C1</tt> and <tt>ext2</tt> is visible to component <tt>C2</tt>, and there is no compatibility issue between <tt>ext1</tt> and <tt>ext2</tt>. It is not excluded that an identifier declared in <tt>C1</tt> conflict with a symbol in <tt>ext2</tt>. As a consequence, a global verification is required when adding a new mathematical extension.
 +
 
 +
== Bibliography ==
 +
* J.R. Abrial, M.Butler, M.Schmalz, S.Hallerstede, L.Voisin, [http://deploy-eprints.ecs.soton.ac.uk/80 ''Proposals for Mathematical Extensions for Event-B''], 2009.
 +
:This proposal consists in considering three kinds of extension:
 +
# Extensions of set-theoretic expressions or predicates: example extensions of this kind consist in adding the transitive closure of relations or various ordered relations.
 +
# Extensions of the library of theorems for predicates and operators.
 +
# Extensions of the Set Theory itself through the definition of algebraic types such as  lists or ordered trees using new set constructors.
 +
 
 +
[[Category:Design proposal]]

Revision as of 13:52, 25 March 2010

Currently the operators and basic predicates of the Event-B mathematical language supported by Rodin are fixed. We propose to extend Rodin to define basic predicates, new operators or new algebraic types.

Requirements

User Requirements

  • Binary operators (prefix form, infix form or suffix form).
  • Operators on boolean expressions.
  • Unary operators, such as absolute values.
Note that the pipe, which is already used for set comprehension, cannot be used to enter absolute values (in fact, as in the new design the pipe used in set comprehension is only a syntaxic sugar, the same symbol may be used for absolute value. To be confirmed with the prototyped parser. It may however be not allowed in a first time until backtracking is implemented, due to use of lookahead making assumptions on how the pipe symbol is used. -Mathieu ).
  • Basic predicates (e.g., the symmetry of relations sym(R) \defi R=R^{-1}).
Having a way to enter such predicates may be considered as syntactic sugar, because it is already possible to use sets (e.g., R \in sym, where sym \defi \{R \mid R=R ^{-1}\}) or functions (e.g., sym(R) = \True, where sym \defi (\lambda R \qdot R \in A \rel B \mid \bool(R = R^{-1}))).
  • Quantified expressions (e.g., \sum x \qdot P \mid y, \prod x \qdot P \mid y, ~\min(S), ~\max(S)).
  • Types.
    • Enumerated types.
    • Scalar types.

User Input

The end-user shall provide the following information:

  • keyboard input
  • Lexicon and Syntax.
    More precisely, it includes the symbols, the form (prefix, infix, postfix), the grammar, associativity (left-associative or right associative), commutativity, priority, the mode (flattened or not), ...
  • Pretty-print.
    Alternatively, the rendering may be determined from the notation parameters passed to the parser.
  • Typing rules.
  • Well-definedness.

Development Requirements

  • Scalability.

Towards a generic AST

The following AST parts are to become generic, or at least parameterised:

  • Lexer
  • Parser
  • Nodes ( Formula class hierarchy ): parameters needed for:
    • Type Solve (type rule needed to synthesize the type)
    • Type Check (type rule needed to verify constraints on children types)
    • WD (WD predicate)
    • PrettyPrint (tag image + notation (prefix, infix, postfix) + needs parentheses)
    • Visit Formula (getting children + visitor callback mechanism)
    • Rewrite Formula (associative formulæ have a specific flattening treatment)
  • Types (Type class hierarchy): parameters needed for:
    • Building the type expression (type rule needed)
    • PrettyPrint (set operator image)
    • getting Base / Source / Target type (type rule needed)
  • Verification of preconditions (see for example AssociativeExpression.checkPreconditions)

Vocabulary

An extension is to be understood as a single additional operator definition.

Tags

Every extension is associated with an integer tag, just like existing operators. Thus, questions arise about how to allocate new tags and how to deal with existing tags.
The solution proposed here consists in keeping existing tags 'as is'. They are already defined and hard coded in the Formula class, so this choice is made with backward compatibility in mind.

Now concerning extension tags, we will first introduce a few hypotheses:

  • Tags_Hyp1: tags are never persisted across sessions on a given platform
  • Tags_Hyp2: tags are never referenced for sharing purposes across various platforms

In other words, cross-platform/session formula references are always made through their String representation. These assumptions, which were already made and verified for Rodin before extensions, lead us to restrict further considerations to the scope of a single session on a single platform.

The following definitions hold at a given instant t and for the whole platform.
Let \mathit{EXTENSION}_t be the set of extensions supported by the platform at instant t;
let \mathit{tag}_t denote the affectation of tags to a extensions at instant t (\mathit{tag}_t \in \mathit{EXTENSION}_t \pfun \intg);
let \mathit{COMMON} be the set of existing tags defined by the Formula class (\mathit{COMMON} \sub \intg).
The following requirements emerge:

  • Tags_Req1: \forall t \qdot \mathit{tag}_t \in \mathit{EXTENSION}_t \tinj \intg
  • Tags_Req2: \forall e, t_1,t_2 \qdot \mathit{tag}_{t_1}(e)=\mathit{tag}_{t_2}(e) where t_1, t_2 are two instants during a given session
  • Tags_Req3: \forall t \qdot \ran(\mathit{tag}_t) \cap \mathit{COMMON} = \empty

The above-mentioned scope-restricting hypothesis can be reformulated into: \mathit{tag} needs not be stable across sessions nor across platforms.

Formula Factory

The requirements about tags give rise to a need for centralising the \mathit{tag} relation in order to enforce tag uniqueness. The Formula Factory appears to be a convenient and logical candidate for playing this role. Each time an extension is used to make a formula, the factory is called and it can check whether its associated tag exists, create it if needed, then return the new extended formula while maintaining tag consistency.

The factory can also provide API for requests about tags and extensions: getting the tag from an extension and conversely.

We also need additional methods to create extended formulæ. A first problem to address is: which type should these methods return ? We could define as many extended types as the common AST API does, namely ExtendedUnaryPredicate, ExtendedAssociativeExpression, and so on, but this would lead to a large number of new types to deal with (in visitors, filters, …), together with a constraint about which types extensions would be forced to fit into. It is thus preferable to have as few extended types as possible, but with as much parameterisation as can be. Considering that the two basic types Expression and Predicate have to be extensible, we come to add two extended types ExtendedExpression and ExtendedPredicate.

ExtendedExpression makeExtendedExpression( ? )
ExtendedPredicate makeExtendedPredicate( ? )

Second problem to address: which arguments should these methods take ? Other factory methods take the tag, a collection of children where applicable, and a source location. In order to discuss what can be passed as argument to make extended formulæ, we have to recall that the make… factory methods have various kinds of clients, namely:

  • parser
  • POG
  • provers

(other factory clients use the parsing or identifier utility methods: SC modules, indexers, …)

Thus, the arguments should be convenient for clients, depending on which information they have at hand. The source location does not obviously seem to require any adaptation and can be taken as argument the same way. Concerning the tag, it depends on whether clients have a tag or an extension at hand. Both are intended to be easily retrieved from the factory. As a preliminary choice, we can go for the tag and adjust this decision when we know more about client convenience.

As for children, the problem is more about their types. We want to be able to handle as many children as needed, and of all possible types. Looking to existing formula children configurations, we can find:

  • expressions with predicate children: \mathit{bool}(P)
  • expressions with expression children: E_1 + E_2
  • predicates with predicate children: P_1 \limp P_2
  • predicates with expression children: \mathit{partition}(S, E_1, E_2)
  • mixed operators: \{x \qdot P(x) \mid E(x)\}, but it is worth noting that the possibility of introducing bound variables in extended formulæ is not established yet.

Thus, for the sake of generality, children of both types should be supported for both extended predicates and extended expressions.

ExtendedExpression makeExtendedExpression(int tag, Expression[] expressions, Predicate[] predicates, SourceLocation location)
ExtendedPredicate makeExtendedPredicate(int tag, Expression[] expressions, Predicate[] predicates, SourceLocation location)


Defining Extensions

An extension is meant to contain every information and behaviour required by:

  • Keyboard
  • (Extensible Fonts ?)
  • Lexer
  • Parser
  • AST
  • Static Checker
  • Proof Obligation Generator
  • Provers

Keyboard requirements

Kbd_req1: an extension must provide an association combo/translation for every uncommon symbol involved in the notation.

Lexer requirements

Lex_req1: an extension must provide an association lexeme/token for every uncommon symbol involved in the notation.

Parser requirements

According to the Parser page, the following informations are required by the parser in order to be able to parse a formula containing a given extension.

  • symbol compatibility
  • group compatibility
  • symbol precedence
  • group precedence

TODO: more

AST requirements

The following hard-coded concepts need to be reified for supporting extensions in the AST. An extension instance will then provide these reified informations to an extended formula class for it to be able to fulfil its API. It is the expression of the missing information for a ExtendedExpression (resp. ExtendedPredicate) to behave as a Expression (resp. Predicate). It can also be viewed as the parametrization of the AST.

Notation

The notation defines how the formula will be printed. In this document, we use the following convention:

  • e_i is the i-th child expression of the extended formula
  • p_i is the i-th child predicate of the extended formula

Example: infix operator "\lozenge"

e_1 \lozenge e_2 \lozenge \ldots \lozenge e_n

We define the following notation framework:

Notation uml.png

On the "\lozenge" infix operator example, the iterable notation would return sucessively:

  • a IFormulaChild with index 1
  • the INotationSymbol "\lozenge"
  • a IFormulaChild with index 2
  • the INotationSymbol "\lozenge"
  • a IFormulaChild with index n

For the iteration not to run forever, the limit n needs to be known: this is the role of the mapsTo() method, which fixes the number of children, called when this number is known (i.e. for a particular formula instance).

Open question: how to integrate bound identifier lists in the notation ?

Well-Definedness

WD predicates also are user-provided data.

Example:

\mathit{D}(e_1) \land (\forall i \cdot i \in 1\mathit{..}(n-1) \Rightarrow (\mathit{D}(e_i) \Rightarrow \mathit{D}(e_{i+1})))

In order to process WD predicates, we need to add the following features to the AST:

  • the \mathit{D} operator
  • expression variables (predicate variables already exist)
  • special expression variables and predicate variables that denote a particular formula child (we need to refer to e_1 and e_i in the above example)
  • a parse() method that accepts these special meta variables and the \mathit{D} operator and returns a Predicate (a WD Predicate Pattern)
  • a makeWDPredicate(aWDPredicatePattern, aPatternInstantiation) method that makes an actual WD predicate
Type Check

An extension shall give a type rule, which consists in:

  • type check predicates (addressed in this very section)
  • a resulting type expression (only for expressions)

Example:

(\forall i \cdot \mathit{type}(e_i) = \pow(\alpha)) \land (\forall i,j \cdot \mathit{type}(e_i)=\mathit{type}(e_j))

Type checking can be reified provided the following new AST features:

  • the \mathit{type} operator
  • type variables (\alpha)
  • the above-mentioned expression variables and predicate variables
  • a parse() method that accepts these special meta variables and the \mathit{type} operator and returns a Predicate (a Type Predicate Pattern)
  • a makeTypePredicate(aTypePredicatePattern, aPatternInstantiation) method that makes an actual Type predicate
Type Solve

This section addresses type synthesizing for extended expressions (the resulting type part of a type rule). It is given as a type expression pattern, so that the actual type can be computed from the children.

Example:

 \pow(\mathit{type}(e_1))

In addition to the requirements for Type Check, the following features are needed:

  • a parse() method that accepts special meta variables and the \mathit{type} operator and returns an Expression (a Type Expression Pattern)
  • a makeTypeExpression(aTypeExpressionPattern, aPatternInstantiation) method that makes an actual Type expression

Static Checker requirements

TODO

Proof Obligation Generator requirements

TODO

Provers requirements

TODO

Extension compatibility issues

TODO

User Input Summarization

Identified required data entail the following user input:

TODO

Impact on other tools

Impacted plug-ins (use a factory to build formulæ):

  • org.eventb.core
In particular, the static checker and proof obligation generator are impacted.
  • org.eventb.core.seqprover
  • org.eventb.pp
  • org.eventb.pptrans
  • org.eventb.ui

Identified Problems

The parser shall enforce verifications to detect the following situations:

  • Two mathematical extensions are not compatible (the extensions define symbols with the same name but with a different semantics).
  • A mathematical extension is added to a model and there is a conflict between a symbol and an identifier.
  • An identifier which conflicts with a symbol of a visible mathematical extension is added to a model.

Beyond that, the following situations are problematic:

  • A formula has been written with a given parser configuration and is read with another parser configuration.
As a consequence, it appears as necessary to remember the parser configuration.
The static checker will then have a way to invalid the sources of conflicts (e.g., priority of operators, etc).
The static checker will then have a way to invalid the formula upon detecting a conflict (name clash, associativity change, semantic change...) mathieu
  • A proof may free a quantified expression which is in conflict with a mathematical extension.
SOLUTION #1: Renaming the conflicting identifiers in proofs?

Open Questions

New types

Which option should we prefer for new types?

  • OPTION #1: Transparent mode.
In transparent mode, it is always referred to the base type. As a consequence, the type conversion is implicitly supported (weak typing).
For example, it is possible to define the DISTANCE and SPEED types, which are both derived from the \intg base type, and to multiply a value of the former type with a value of the latter type.
  • OPTION #2: Opaque mode.
In opaque mode, it is never referred to the base type. As a consequence, values of one type cannot be converted to another type (strong typing).
Thus, the above multiplication is not allowed.
This approach has at least two advantages:
  • Stronger type checking.
  • Better prover performances.
It also has some disadvantages:
  • need of extractors to convert back to base types.
  • need of extra circuitry to allow things like x:=d*2 where x, d are of type DISTANCE
  • OPTION #3: Mixed mode.
In mixed mode, the transparent mode is applied to scalar types and the opaque mode is applied to other types.

Scope of the mathematical extensions

  • OPTION #1: Project scope.
The mathematical extensions are implicitly visible to all components of the project that has imported them.
  • OPTION #2: Component scope.
The mathematical extensions are only visible to the components that have explicitly imported them. However, note that this visibility is propagated through the hierarchy of contexts and machines (EXTENDS, SEES and REFINES clauses).
An issue has been identified. Suppose that ext1 extension is visible to component C1 and ext2 is visible to component C2, and there is no compatibility issue between ext1 and ext2. It is not excluded that an identifier declared in C1 conflict with a symbol in ext2. As a consequence, a global verification is required when adding a new mathematical extension.

Bibliography

This proposal consists in considering three kinds of extension:
  1. Extensions of set-theoretic expressions or predicates: example extensions of this kind consist in adding the transitive closure of relations or various ordered relations.
  2. Extensions of the library of theorems for predicates and operators.
  3. Extensions of the Set Theory itself through the definition of algebraic types such as lists or ordered trees using new set constructors.