13.4 Conversion to Hypertext
This section gives anecdotal evidence of our experience collected while converting the extensive university-level textbook Introduction to Algorithms by Tom Cormen, Charles Leiserson, and Ron Rivest [Cor90] to hypertext. The textbook is particularly well suited for the conversion to hypertext, since the authors organized the text in hierarchical form composed of short sections that are rather self-contained. Nevertheless, the chunking of the linear textbook into hypertext nodes required extensive manual post-processing by subject-matter experts (students, which had taken the course given by one of the book authors) to maintain our chunking principles:
- A node should be a single chunk of information. To be effective, nodes must be expressed in a logically coherent manner, able to stand alone and convey their information content without need for constant reference to other nodes. Conversions of serial text which construct "nodes" by putting the next 25 lines of text into separate screens fail to build nodes that meet these criteria.
- The node should be of a length short enough to fit within a single screen, but also must be long enough to contain a substantial thought or concept. The single-screen concept is of value since all the information in a self-contained node is visually available without scrolling. This enables the user to relate concepts in the node without having to "flip" between separate chunks of text. The hypertext concept also tries to decompose larger concepts into smaller ideas. As the human short term memory is not able to store more than four to nine items, even a small Macintosh screen is of appropriate size to display one idea composed of four to nine sentences. We decided that if nodes had to be larger than an individual screen, the text would be contained in a scrolling field. Although scrolling fields are aesthetically unappealing and not the most effective method of displaying large amounts of text, the alternative would have been to have parts of the node on separate screens, which would have made viewing separate parts of a node similar to viewing different nodes. We felt that the sense of a node as a unit was more important than the slight disadvantage caused by scrolling screens.
The original textbook was produced using the LATEX macro package for TEX [Lam85]. The sophisticated referencing facilities available in LATEX , which were used extensively in the text source, offered major advantages in converting the text. As used, LATEX enabled the abstraction of citations, index entries, definitions, figures, and proofs. These methods associate one region of text with other regions of text, figures, index entries, or other elements. LATEX also offers great control over the hierarchical structure of the document, which allowed a mapping of the text into detail levels. The smallest hierarchical structures used in the LATEX source were subsections or subheadings, which were often of the right size and content to constitute a node.
In converting the LATEX source text to nodes, the text making up each node was written to individual files, and references to that particular node were noted in lists according to the reference type. From these files, a utility program created nodes in the hyperdocument. Intrinsic links between nodes were generated automatically using relationships between nodes as defined in the LATEX reference lists. In this sense, LATEX enabled us to automate the generation of structural links in the same manner that SGML, HTML or HyTime [New91] might.
One of the more difficult aspects of converting the text arose from the mathematical nature of the source text. Rather than laboriously recreate each mathematical expression, an interpreter was written in HyperTalk that translated most of LATEX's in-line math and text formatting commands. As the Macintosh and HyperCard do not support the wide range of symbols found in the source text, we had to develop two separate fonts to provide all the necessary math symbols. Additionally, we had to manually convert and edit 1230 formulas as bitmaps, because they could not be displayed on the screen using our custom fonts. Also, we scanned in and edited 327 figures.
After this hands-on description of the text-to-hypertext conversion process for hierarchical hypertext, the next chapter introduces the mechanisms of the Gloor/Dynes hypertext engine that specifically support navigation in hierarchically structured documents.