18.3 Implementing CYBERMAP for the Web

We are currently developing the web version of CYBERMAP. The main design goal of this implementation is to achieve distribution and scalability. Figure I.95 illustrates the system architecture of the CYBERMAP web version.


Figure I.95 CYBERMAP web implementation architecture

We initially based the CYBERMAP clustering mechanism on the WAIS indexer, but we soon found it too complicated. Also, the WAIS documentation is poor and sometimes inconsistent with the source code. SWISH, on the other hand, is relatively simple, and well documented (see chapter 7 for a brief description of SWISH). SWISH specifically knows about HTML, and therefore allows to weight HTML-tagged keywords accordingly.

The SWISH index is used as input for a scatter/gather-based clustering algorithm. The scatter/gather algorithm [Cut92, Cut93] offers much faster clustering than the document clustering described in section 17.6. Compared to the original CYBERMAP clustering algorithm that exhibits quadratic run time behavior because all pairs of similarities must be considered in each run, Scatter/Gather offers near linear performance. We are using a variant of the algorithm [Cut92] that works as follows

To be able to manage large clusters, we apply this procedure recursively to large clusters, until our clusters have the desired node size.

Clustering is implemented in Java. We were first considering implementing it in C, but then opted for Java to allow for portable clustering at a client's machine without the need to recompile the clustering engine for every new CPU or operating system version. This also permits distributed clustering, such that new nodes can be flexibly added to clusters at the client's site.


Figure I.96 Java GUI of CYBERMAP Web version

The Web CYBERMAP GUI has been implemented using Java and the AWT (Abstract Window Toolkit) [Van96], a portable API (application programming interface) and Java class library that implements user interface functionality on all Java platforms.

The next section reviews the different design stages of the CYBERMAP GUI over the last five years up to this most current implementation in Java and AWT.