Indexing System

From Event-B
Revision as of 18:24, 8 December 2008 by imported>Nicolas
Jump to navigationJump to search

TODO This document aims at helping developers getting into the code of the index plug-in. Please refer to Rodin Index Design for more general considerations about the purpose and design.

General structure of the package

The indexer plug-in is made of the following packages:

  • org.rodinp.core.index: public interfaces and Plugin class
  • org.rodinp.internal.core.index: implementation of public interfaces and main classes (including IndexManager)
  • org.rodinp.internal.core.index.tables: index tables (memory storage)
  • org.rodinp.internal.core.index.sort: generical sorting support (used to manage file dependencies)
  • org.rodinp.internal.core.index.persistence: index saving main classes (disk storage)
  • org.rodinp.internal.core.index.persistence.xml: index saving in xml (Strategy pattern)


The Index Manager

IndexManager is the master class of the plug-in. There resides the start() method called in a background job at plug-in startup. Firstly, it restores saved index data from previous session (see #Persistence). Then it loops, waiting for database deltas to be enqueued and launching the indexing of changed files via a user-cancellable job (see #Concurrency).

Indexing is done in two steps, project per project:

Finally, when the session closes, index data is saved (see #Persistence) and the background job is cancelled.

The IndexManager is also the entry point for index requesting.


Project Index Manager

A ProjectIndexManager (PIM) is responsible for maintaining #Index Tables of one project. The IndexManager has a PerProjectPIM, that is simply an association between projects and their respective PIMs.

A PIM is called by the IndexManager at two moments:

  • when a file has just changed (fileChanged()), to update file #Dependencies
  • when the indexing job is scheduled (doIndexing()), to index files and update its #Index Tables

It delegates indexer calling to the #File Indexing Manager.

Indexer Registry

The IndexerRegistry (Singleton) records registered indexers. It maps file types on lists of indexers, so that when a given file has to be indexed, it can be requested to give the corresponding registered indexers, in the order they have to be called.

File Indexing Manager

Actual indexer calls are achieved by the FileIndexingManager (FIM), which implements Singleton pattern. Being given a file, it gets appropriate indexers from the #Indexer Registry, asks them either to get #Dependencies of the file or to index it, then returns the result to the calling PIM.

Index Tables

Those tables store indexing result data.

Rodin Index

RodinIndex is the main index table. It stores the entities together with their occurrences. More precisely, it maps each indexed IInternalElement to a Descriptor.

A Descriptor is a Declaration (handle + name) of an element plus a set of Occurrences of this element.

An Occurrence has an OccurrenceKind (defined by indexer plug-in implementers) and a RodinLocation (a location of the element within another in some file).

Export Table

The ExportTable records declarations of elements exported by each indexed file.

File Table

The FileTable gives, for each file, the elements indexed in it. It contains no more information than the RodinIndex (can be constructed entirely from it), but is intended to give faster access to indexed elements of a given file, rather than iterating through the whole index.

Name Table

The NameTable follows the same optimization idea: for each indexed user-defined name, it gives a set of elements declared with this name.

Dependencies

File dependencies are treated generically in the org.rodinp.internal.core.index.sort package. The final goal is to maintain an iterable Total Order on the files of each project. PIMs will then follow this order when indexing their project, using type IRodinFile for generic parameter T.

Node

The Node is to be understood as a node in the dependence graph. It has a label of type T and lists of predecessors and successors.

Telling that N1 is a predecessor of N2 means that N2 depends on N1, so N1 must be indexed before N2.

Nodes also have an order position (orderPos field) that is the position of this node in the total order. Its value can only be known after nodes have been sorted.

Not all nodes need to be processed each time. This is the reason why nodes have a mark: when iterating over nodes, marked ones are processed (indexed) whereas others are simply ignored.

Total Order

TotalOrder is the main class. It implements Iterator, providing next(), hasNext() and remove() methods, along with other specific facilities.

It delegates graph maintenance to Graph and node sorting and iterating to SortedNodes.

When the graph changes, nodes are sorted again, but only if iterating is requested. This avoids sorting at each change.

Graph change events are received through a listener. This might seem strange at first: as the TotalOrder class calls for graph modification itself, it should know when it changes. The problem is that, for instance, predecessors can be set to exactly the same list of nodes as previously. In this case, the graph remains unchanged but TotalOrder cannot guess it and will unnecessarily sort the whole graph again (well, actually it could guess, but we prefer delegating change check to Graph). Same problem when creating a node that already exists.

Graph

The Graph stores nodes, applies changes to their links and maintains their coherence: if N1 is a predecessor of N2 then N2 is a successor of N1.

As explained above, it fires change events when a method call actually modifies the graph.

Sorted Nodes

SortedNodes maintains a list of iterable sorted nodes. It delegates sorting to a Sorter.

An important feature is that, if the graph is modified during iteration, it restarts from the first node that differs between previous order and new order. To that end, iterated nodes are recorded.

Indexing Toolkit

Concurrency

Persistence