20.1 Cybertree Algorithms
The algorithms described in this chapter are based on similarities between nodes that have been computed as described in section "Computing the Similarity Between Nodes". The computation of a global tree structure occurs in two steps. In step one, algorithm CREATE-LOCAL-TREES computes a collection of local trees. The algorithm divides and structures all nodes into multiple disjoint trees. As a side effect of computing local trees, this partitioning also works as a computationally cheap clustering algorithm[22]. In the optional second step CREATE-GLOBAL-TREE merges the local trees into one global tree.
CREATE-LOCAL-TREES first sorts all possible links between the nodes by similarity. It then walks iteratively through the sorted similarity list and basically adds in each iteration of the loop one node to the tree containing a parent that is the most similar to the node being added.
CREATE-LOCAL-TREES
sort all link triples <similarity, node1, node2> descending by similarity; treelist <- NIL; for i <- 1 to number of links if node1i NOT IN treelist AND node2i NOT IN treelist create new tree with root node1i and first child node2i; add tree to treelist; if node1i IN treelist AND node2i NOT IN treelist add node2i to tree with parent node1i; if node2i IN treelist AND node1i NOT IN treelist add node1i to tree with parent node2i; if node1i IN treelist AND node2i IN treelist ignore this link triple;
Figure I.117 contains a practical example of the stepwise execution of CREATE-LOCAL-TREES using the numbers from figure I.92.
Figure I.117 Example of creating local treestreelist(1) in figure I.117 represents the state of the local trees after the first iteration through the similarity list. The second line of the similarity list adds node 1 to the first tree. Line three causes the generation of a second tree in treelist(3). The subsequent iteration steps add nodes 4 and 5 to the second tree. The last three lines of similarity list are ignored for the creation of the local trees because all nodes already have been placed.
CREATE-GLOBAL-TREE merges the local trees into one global tree. Of course the trees can only be merged if there is a similarity > 0 between at least two nodes of different trees. If there is no similarity between two trees, i.e., there are no nodes in the two trees that have at least one common keyword, the trees are completely unrelated with respect to our similarity measure and can thus not be merged.
CREATE-GLOBAL-TREE starts with the tree that has been created last because this tree has the weakest similarities and is therefore a prime candidate to be merged with earlier created trees. The algorithm takes the similarity triple with the highest weight where one node of the triple is the root of the tree to be merged and the other node of the triple is not in the tree to be merged.
CREATE-GLOBAL-TREE:
treelist <- CREATE-LOCAL-TREES; for i <- number of trees in treelist down to 1 take triple <similarity, rooti, nodek> or <similarity, nodek, rooti> of similarity list with highest similarity where nodek NOT IN treei; merge treei with treek using nodek as parent of treei; delete treei in treelist;
Figure I.118 shows the application of CREATE-GLOBAL-TREE for our previously used data set. Since there are only two subtrees to be merged, line 6 in similarity list does the job, linking root 6 of tree two to node 3 in tree one.
Figure I.118 Example of creating a global treeOf course, this algorithm can be refined by selecting the root of the local trees more flexibly. In figure I.118 the second tree would then be re-rooted and node 7 would become the connecting root.