Rodin Keyboard: Difference between revisions

From Event-B
Jump to navigationJump to search
imported>Nicolas
imported>Nicolas
Line 37: Line 37:
#: A combination input is a valid text token if it matches <pre>[a-zA-Z0-9_]+</pre> (this excludes 'λ', '$', 'ℙ', … from identifiers)
#: A combination input is a valid text token if it matches <pre>[a-zA-Z0-9_]+</pre> (this excludes 'λ', '$', 'ℙ', … from identifiers)
#: A combination input is a valid math token if it matches either:
#: A combination input is a valid math token if it matches either:
#:* <pre>[^a-zA-Z0-9_\s][^a-zA-Z0-9\s]*</pre> ('_' allowed after start character)
#:* <pre>[^a-zA-Z0-9_\s][^a-zA-Z0-9\s]*</pre> (regular math token '_' allowed after start character)
#:* <pre>[^a-zA-Z0-9_\s][a-zA-Z0-9_]*</pre> (LaTeX style)
#:* <pre>[^a-zA-Z0-9_\s][a-zA-Z0-9_]*</pre> (LaTeX style token)
# The translator must check that combination output tokens are not a substring of any input token; this enforces one-pass termination of the translation
# The translator must check that combination output tokens are not a substring of any input token; this enforces one-pass termination of the translation
# The translator must not process a text token that has the caret over it
# The translator must not process a text token that has the caret over it
# The translator must process a math token (w or w/o the caret over it) that is the longest possible input combination
# The translator must process a math token (w or w/o the caret over it) that is the longest possible input combination
# Once a symbol has been translated, it can never be translated back into its original text form (even virtually during a translation)
# Once a symbol has been translated, it can never be translated back into its original text form (even virtually during a translation)
# The translator must not process a language token, like ':∈' or '$POW' (but we may concede an exception for predicate variables)
# The translator must not process a language token, like ':∈' or '$POW' (but we may concede an exception for predicate variables for simplicity)


== Algorithm ==
Given text 't' to translate and 'c' the caret position:
# match and translate LaTeX style math tokens (doing it before text tokens, as it may contain translatable text substrings, and before regular math tokens because of '\')
# match and translate regular math tokens
# match and translate text tokens
In order to match, use the above regular expressions for each category.
Only search for first match, translate, then search for other matches in translated text repeatedly.
=== Math Translation ===
Translate substrings of matched parts that equal an input combination.
For instance, given '::=', translate into ':∈=', rather than ':≔'.
Other examples:
* '\lambdax' should translate into 'λx'.
* '\oftypeNAT' should translate into 'NAT' (in a math translation step, but NAT will be processed by text translation)
If there existed a translation for '/' into '÷' and a translation of '=/=' into '≠' (but no translation for '/=' or any '/*')
* '=/' with caret should stay unchanged because '=/' potentially starts a combination input, although '/' by itself is the longest combination input starting with '/'
* '=/' without caret translates to '=÷'
=== Text Translations ===
Translate whole matched parts that equal an input combination.
Do not translate within an identifier, for instance 'NATURAL' must remain unchanged by translation, rather than 'ℕURAL'.


[[Category:Design]]
[[Category:Design]]

Revision as of 16:38, 20 May 2014

The Rodin Keyboard is an extensible keyboard for inputing mathematical formula (in Unicode). The Rodin keyboard provides the following facilities:

  • A ModifyListener (RodinModifyListener) that can be attached to a SWT widget. When the content of the widget is modified, the keyboard reacts and translate the content accordingly. Currently, RODIN Keyboard supports Text and StyledText widget.
  • An utility class Text2MathTranslator with a static method translate(String) for manually translating any string (or sub-part of a string) into mathematical formula.
  • An Eclipse View called Rodin Keyboard View which provides an text input area which will translate the input text into mathematical formula. This View can be found under category RODIN.

The Rodin Keyboard however does not contain any pre-defined translation rules for any mathematical symbols. Instead, this task is left for the developers who want to declare different "keyboards" corresponding to the mathematical language that they want to use. Moreover, different combinations can be used to enter the same mathematical symbols.

Currently, there are two keyboards available:

  • Standard keyboard for Event-B.
  • LaTeX-style keyboard for Event-B.

Specification

Definitions

  • Translator: function that transforms a (source chain, source caret position) into a (target chain, target caret position)
  • Token: character sequence in a chain that is of one lexical kind, as long as possible
  • Transformation Rule: specification of the modification of a token sequence into another token sequence
  • Combination: specification of a transformation from a single input token into another single output token
  • There are 3 kinds of lexical entities in a source chain:
    • spacings (space, tabulation, end of line, ...)
    • text symbols (identifier-like)
    • math symbols (all other symbols)

Functional Needs

  1. Translate a whole chain all at once
  2. Translate on the fly, characters being typed in continuously
  3. Maintain a coherent caret position
  4. Keep spacing unchanged
  5. Support statically contributed combinations (add)
  6. Support dynamically contributed combinations (check validity, add and remove, will be useful for theories)
    + maybe handle a scope ? could help when working with 2 projects that define the same combination input for different outputs
  7. Avoid non-termination cases (combinations that produce input of combinations)

Requirements

  1. The translator must check that combination input tokens are either text or math symbols
    A combination input is a valid text token if it matches
    [a-zA-Z0-9_]+
    (this excludes 'λ', '$', 'ℙ', … from identifiers)
    A combination input is a valid math token if it matches either:
    • [^a-zA-Z0-9_\s][^a-zA-Z0-9\s]*
      (regular math token '_' allowed after start character)
    • [^a-zA-Z0-9_\s][a-zA-Z0-9_]*
      (LaTeX style token)
  2. The translator must check that combination output tokens are not a substring of any input token; this enforces one-pass termination of the translation
  3. The translator must not process a text token that has the caret over it
  4. The translator must process a math token (w or w/o the caret over it) that is the longest possible input combination
  5. Once a symbol has been translated, it can never be translated back into its original text form (even virtually during a translation)
  6. The translator must not process a language token, like ':∈' or '$POW' (but we may concede an exception for predicate variables for simplicity)


Algorithm

Given text 't' to translate and 'c' the caret position:

  1. match and translate LaTeX style math tokens (doing it before text tokens, as it may contain translatable text substrings, and before regular math tokens because of '\')
  2. match and translate regular math tokens
  3. match and translate text tokens

In order to match, use the above regular expressions for each category. Only search for first match, translate, then search for other matches in translated text repeatedly.

Math Translation

Translate substrings of matched parts that equal an input combination. For instance, given '::=', translate into ':∈=', rather than ':≔'. Other examples:

  • '\lambdax' should translate into 'λx'.
  • '\oftypeNAT' should translate into 'NAT' (in a math translation step, but NAT will be processed by text translation)

If there existed a translation for '/' into '÷' and a translation of '=/=' into '≠' (but no translation for '/=' or any '/*')

  • '=/' with caret should stay unchanged because '=/' potentially starts a combination input, although '/' by itself is the longest combination input starting with '/'
  • '=/' without caret translates to '=÷'


Text Translations

Translate whole matched parts that equal an input combination. Do not translate within an identifier, for instance 'NATURAL' must remain unchanged by translation, rather than 'ℕURAL'.