18.3 Implementing CYBERMAP for the Web
We are currently developing the web version of CYBERMAP. The main design goal of this implementation is to achieve distribution and scalability. Figure I.95 illustrates the system architecture of the CYBERMAP web version.
Figure I.95 CYBERMAP web implementation architectureWe initially based the CYBERMAP clustering mechanism on the WAIS indexer, but we soon found it too complicated. Also, the WAIS documentation is poor and sometimes inconsistent with the source code. SWISH, on the other hand, is relatively simple, and well documented (see chapter 7 for a brief description of SWISH). SWISH specifically knows about HTML, and therefore allows to weight HTML-tagged keywords accordingly.
The SWISH index is used as input for a scatter/gather-based clustering algorithm. The scatter/gather algorithm [Cut92, Cut93] offers much faster clustering than the document clustering described in section 17.6. Compared to the original CYBERMAP clustering algorithm that exhibits quadratic run time behavior because all pairs of similarities must be considered in each run, Scatter/Gather offers near linear performance. We are using a variant of the algorithm [Cut92] that works as follows
- Find k centers (centroids) by using the initial quadratic algorithm over the document set as described in section 17.6.
- Assign all documents to the centers in one single pass by assigning each node to the most similar center.
To be able to manage large clusters, we apply this procedure recursively to large clusters, until our clusters have the desired node size.
Clustering is implemented in Java. We were first considering implementing it in C, but then opted for Java to allow for portable clustering at a client's machine without the need to recompile the clustering engine for every new CPU or operating system version. This also permits distributed clustering, such that new nodes can be flexibly added to clusters at the client's site.
Figure I.96 Java GUI of CYBERMAP Web versionThe Web CYBERMAP GUI has been implemented using Java and the AWT (Abstract Window Toolkit) [Van96], a portable API (application programming interface) and Java class library that implements user interface functionality on all Java platforms.
The next section reviews the different design stages of the CYBERMAP GUI over the last five years up to this most current implementation in Java and AWT.