Rodin Keyboard: Difference between revisions

From Event-B
Jump to navigationJump to search
imported>Nicolas
imported>Nicolas
 
(10 intermediate revisions by the same user not shown)
Line 11: Line 11:


= Specification =
= Specification =
{{TODO}} REWORK: rethink the specification, avoid regex, new design based on a [http://en.wikipedia.org/wiki/Trie Trie] data structure.
Make a unique algorithm for all cases (Math + LaTeX + Text).
New ideas are:
* no translation if the caret is over a translatable combo (be it math/latex/text, even if it is the longest combo)
* a node is '''terminal''' if it holds a complete combo
* a text node is '''cuttable''' if it is not followed by an identifier part in the text to translate (see <code>org.eventb.internal.core.lexer.LexicalClass.IDENTIFIER</code>)
* a math node is always '''cuttable'''
* a node is '''translatable''' if it is both '''terminal''' AND '''cuttable'''
* the translation consists in finding the longest translatable node in a text, starting from a given index, based on a trie that encodes the available combos; when the traversal exits the trie, we consider the latest translatable combo found: the translation is triggered if and only if the caret is not over it; then, the start index comes just after the corresponding translatable node and the translation goes on.


<!--
== Definitions ==
== Definitions ==
* '''Translator''': function that transforms a ('''source chain''', '''source caret position''') into a ('''target chain''', '''target caret position''')
* '''Translator''': function that transforms a ('''source chain''', '''source caret position''') into a ('''target chain''', '''target caret position''')
Line 29: Line 39:
# Support statically contributed combinations (add)
# Support statically contributed combinations (add)
# Support dynamically contributed combinations (check validity, add and remove, will be useful for theories)
# Support dynamically contributed combinations (check validity, add and remove, will be useful for theories)
#:+ maybe handle a scope ? could help when working with 2 projects that define the same combination input for different outputs
#:But no notion of scope; when proposed an already existing combination input for a different output, the newer one must be refused (invalid), the user shall close the project with the offending combination in order to replace it.
# Avoid non-termination cases (combinations that produce input of combinations)
# Avoid non-termination cases (combinations that produce input of combinations)


Line 37: Line 47:
#: A combination input is a valid text token if it matches <pre>[a-zA-Z0-9_]+</pre> (this excludes 'λ', '$', 'ℙ', … from identifiers)
#: A combination input is a valid text token if it matches <pre>[a-zA-Z0-9_]+</pre> (this excludes 'λ', '$', 'ℙ', … from identifiers)
#: A combination input is a valid math token if it matches either:
#: A combination input is a valid math token if it matches either:
#:* <pre>[^a-zA-Z0-9_\s][^a-zA-Z0-9\s]*</pre> (regular math token '_' allowed after start character)
#:* <pre>[^a-zA-Z0-9_\s]+</pre>
#:* <pre>[^a-zA-Z0-9_\s][a-zA-Z0-9_]*</pre> (LaTeX style token)
#:* <pre>\[a-zA-Z0-9_]+</pre> (LaTeX style token)
# The translator must check that combination output tokens are not a substring of any input token; this enforces one-pass termination of the translation
# The translator must check that combination output tokens are not a substring of any input token; this enforces one-pass termination of the translation
# The translator must not process a text token that has the caret over it
# The translator must not process a text token that has the caret over it
# The translator must process a math token (w or w/o the caret over it) that is the longest possible input combination
# The caret is over a substring if it is next to any character of the substring (left, right or both): there are 3 caret positions over "ab"
# Once a symbol has been translated, it can never be translated back into its original text form (even virtually during a translation)
# Once a symbol has been translated, the translation itself it can never be translated into anything else
# The translator must not process a language token, like ':∈' or '$POW' (but we may concede an exception for predicate variables for simplicity)
# The translator must not process a language token, like ':∈' or '$POW' (but we may concede an exception for predicate variables for simplicity)


== Algorithm ==
== Algorithm ==
Line 56: Line 65:
Only search for first match, translate, then search for other matches in translated text repeatedly.
Only search for first match, translate, then search for other matches in translated text repeatedly.


=== Math Translation (regular or LaTeX style) ===
=== LaTeX Style Translation ===


Translate first longest substrings of matched part that equals a math input combination.
Translate first longest substrings of matched part that equals a math input combination.


Given text 'matched' to translate and caret position 'caret' (translated to fit inside 'matched', if needed):
Given text 'matched' to translate and caret position 'caret' (translated to 'matched', (== original caret position in whole text minus index of 'matched')):
 
<pre>
 
caretShift = 0 // will be added to the caret position after the translation, to determine new caret position
pending_translation = false
 
attempt = matched
while(attempt.length() > 0):
  attempt_has_caret = 0 <= caret <= attempt.length()
  possible_combinations = find_latex_combinations_with_prefix(attempt)
 
  foreach combo in possible_combinations by decreasing length:
    if combo.length() > attempt.length():
      if attempt_has_caret:
        return (translation = matched, pending_translation = true, caretShift = 0)
      else:
        // forget this possibility, the user is not typing it
        possible_combinations.remove(combo)
        continue
    if combo.length() == attempt.length(): // actually they are equal then
      translated_combo = translate(combo)
      if caret >= combo.length():
        caretShift += translated_combo.length() - combo.length()
      return (translation = translated_combo, pending_translation = true, caretShift = caretShift)
     
  attempt = remove_last_character(attempt)
 
return (translation = matched, pending_translation = false, caretShift = 0)
 
</pre>


=== Math Translation ===
Translate first longest substrings of matched parts that equal an input combination.
For instance, given '::=', translate into ':∈=', rather than ':≔'.
Other examples:
* '\lambdax' should translate into 'λx'.
* '\oftypeNAT' should translate into 'NAT' (in a math translation step, but NAT will be processed by text translation)
If there existed a translation for '/' into '÷' and a translation of '=/=' into '≠' (but no translation for '/=' or any '/*')
* '=/' with caret should stay unchanged because '=/' potentially starts a combination input, although '/' by itself is the longest combination input starting with '/'
* '=/' without caret translates to '=÷'
Given text 'text' to translate and caret position 'caret':


<pre>
<pre>
Line 75: Line 126:


   if i equals length of matched:
   if i equals length of matched:
     return translation
     return (translation, pending_translation, caretShift)


   // i is now at start of a combination input
   // i is now at start of a combination input
Line 113: Line 164:
      
      
</pre>
</pre>
=== Math Translation ===
Translate first longest substrings of matched parts that equal an input combination.
For instance, given '::=', translate into ':∈=', rather than ':≔'.
Other examples:
* '\lambdax' should translate into 'λx'.
* '\oftypeNAT' should translate into 'NAT' (in a math translation step, but NAT will be processed by text translation)
If there existed a translation for '/' into '÷' and a translation of '=/=' into '≠' (but no translation for '/=' or any '/*')
* '=/' with caret should stay unchanged because '=/' potentially starts a combination input, although '/' by itself is the longest combination input starting with '/'
* '=/' without caret translates to '=÷'
Given text 'text' to translate and caret position 'caret':


=== Text Translations ===
=== Text Translations ===
Line 132: Line 169:
* do not translate within an identifier, for instance 'NATURAL' must remain unchanged by translation, rather than 'ℕURAL'
* do not translate within an identifier, for instance 'NATURAL' must remain unchanged by translation, rather than 'ℕURAL'
* do not translate a match that has the caret, so as to enable inputting identifiers that contain a combination input
* do not translate a match that has the caret, so as to enable inputting identifiers that contain a combination input
-->
[[Category:Design]]
[[Category:Design]]

Latest revision as of 11:59, 23 June 2014

The Rodin Keyboard is an extensible keyboard for inputing mathematical formula (in Unicode). The Rodin keyboard provides the following facilities:

  • A ModifyListener (RodinModifyListener) that can be attached to a SWT widget. When the content of the widget is modified, the keyboard reacts and translate the content accordingly. Currently, RODIN Keyboard supports Text and StyledText widget.
  • An utility class Text2MathTranslator with a static method translate(String) for manually translating any string (or sub-part of a string) into mathematical formula.
  • An Eclipse View called Rodin Keyboard View which provides an text input area which will translate the input text into mathematical formula. This View can be found under category RODIN.

The Rodin Keyboard however does not contain any pre-defined translation rules for any mathematical symbols. Instead, this task is left for the developers who want to declare different "keyboards" corresponding to the mathematical language that they want to use. Moreover, different combinations can be used to enter the same mathematical symbols.

Currently, there are two keyboards available:

  • Standard keyboard for Event-B.
  • LaTeX-style keyboard for Event-B.

Specification

TODO REWORK: rethink the specification, avoid regex, new design based on a Trie data structure. Make a unique algorithm for all cases (Math + LaTeX + Text). New ideas are:

  • no translation if the caret is over a translatable combo (be it math/latex/text, even if it is the longest combo)
  • a node is terminal if it holds a complete combo
  • a text node is cuttable if it is not followed by an identifier part in the text to translate (see org.eventb.internal.core.lexer.LexicalClass.IDENTIFIER)
  • a math node is always cuttable
  • a node is translatable if it is both terminal AND cuttable
  • the translation consists in finding the longest translatable node in a text, starting from a given index, based on a trie that encodes the available combos; when the traversal exits the trie, we consider the latest translatable combo found: the translation is triggered if and only if the caret is not over it; then, the start index comes just after the corresponding translatable node and the translation goes on.