11.5 Mapping the Web
The multidimensional structure of the web is ideally suited for visualization by mapping. There have been different approaches suggested at recent web conferences, and there is now even a commercial company that sells tools for "web cartography". This section selectively reviews some of these systems.
[12]. Their system clusters HTML files based on contents by analyzing their title, headings, and general contents. Further analysis of different types of HTML links such as:
allows to automatically create local maps for any HTML page displaying the page in its local context as depicted in figure I.63.
- Hyperlink references -- <A HREF = "URL> .. </A>.
- Image references -- <IMG SRC = "URL">...</IMG>
- Embedded references -- <EMBED SRC = "URL>
Figure I.63 Web browsing using SOUR's Result Manager
http://www.w3.org/pub/Conferences/WWW4/Papers/portugalThis local map can be used to browse in the web. Also, the database of URLs built up by the SOUR tool can be queried directly, to, e.g., find out all the pages that contain links to a certain page. While the SOUR system is part of a software engineering project, the next system tackles the problem from a different side using AI concepts, but comes to similar results.
Maps of Hyperspace
http://agora.leeds.ac.uk/spacenet/ghedini.htmlGhedhini Ralha and Anthony Cohn suggest an artificial intelligence-based approach to building maps of hyperspace [Ghe95]. They are trying to build spatial cognitive maps of the hyperspace influenced by the human cognitive mapping process.
Figures I.64. and I.65 display a small subset of a sample hyperspace in Venn and node graph notation, respectively. They have been drawn manually based on automatic clustering of the contents of the HTML files.
Figure I.65 Node graph of sample hyperspace
http://agora.leeds. ac.uk/spacenet/ ghedini.htmlGhedini Rahlha and Cohn's system is based on the same similarity measure as described in the previous chapter about similarity, where the goal is not only to display actual links, but to combine information about the linking structure with a semantic analysis of the contents of the HTML pages. The next system extends the XEROX PARC Information Visualizer Cone Tree structure into 3D hyperbolic space.
3D Hyperbolic Space
http://www.geom.umn.edu/docs/research/webviz/Tamara Munzner and Paul Burchard suggest the use of 3D hyperbolic space for the visualization of the web structure. They use a 3D graphical representation available in their own format as well as in the VRML format, to allow for interactive browsing in hyperbolic space.
Figure I.66 Screen shot of a interactive flight in 3D hyperbolic webspace
http://www.geom.umn.edu/docs/research/webviz/node2.htmlTheir layout is a variant of the XEROX PARC Cone Tree as discussed above in this chapter. Contrary to original cone trees, hyperbolic cone trees are much less cluttered and therefore offer the big picture and interesting details at the same time. Using an interactive 3D browser, the user can navigate in the hyperbolic tree, which has been drawn in figure I.66 in the interior of a ball. By selecting a single node, the user can then jump from the hyperbolic browser to the selected node in an ordinary web browser such as Netscape.
Navigational View Builder
http://www.cc.gatech.edu/gvu/people/Phd/sougata/Nvb.htmlSougata Mukherjea and James Foley suggest an interactive approach to building visual structures of the web. Their Navigational View Builder [Muk95] uses a combination of structural and contents analysis for computing different types of visualizations.
As a motivation, Mukherjea and Foley display an unfiltered, very small subset of the Web pages of the Graphics, Visualization and Usability Center at the Georgia Institute of Technology (figure I.67). Its complexity renders this visualization almost useless.
Figure I.67Small subset of the web displaying all links ("Spiderweb view")
http://www.cc.gatech.edu/gvu/people/Phd/sougata/bind.gifThe advantage of such a spider web representation is that it can be computed automatically. Mukherjea and Foley apply all sorts of filtering mechanisms to this spider web representation. They allow to filter for file type, such as the web in figure I.69, that only displays HTML nodes that are connected to images (gif files) and movies (mpg files).
Figure I.68 Example of structure-based filtering
http://www.cc.gatech.edu/gvu/people/Phd/sougata/filter.gifThe Navigational View Builder also allows to cluster by contents, such as by author, topic, etc., and by link structure, such as "distance from current", "number of children", etc.. To further reduce screen clutter, filtering criteria can be applied recursively to form clusters of similar nodes.
The system incorporates the concepts of the XEROX Information Visualizer such as Cone Trees (figure I.68) and Perspective Walls, as well as Tree Maps (all described elsewhere in this book).
Figure I.68 3D tree view of default hierarchy of GVU Research pages
http://www.cc.gatech.edu/gvu/people/Phd/sougata/hier.gifAfter these research systems that illustrate interesting concepts, but are hard to get for non-academic users, we will now present a commercial system for visualizing the web structure that should be available on the market by the time this book will be publicly available.
NetCarta
http://www.netcarta.comNetCarta calls itself the leader in the World Wide Web cartography software market. NetCarta CyberPilot Pro and WebMapper are commercial products that allow to graphically modify properties and linking structure of the HTML pages on a particular web site. NetCarta sees its products to be used for web management, site analysis, structure-based search and retrieval, and concept-based publishing. The simpler CyberPilot Pro allows to create and read WebMaps which provide the user with a visual representation of a web site. It generates a hierarchical structure of the site and permits to filter out the pages a user is most interested in. Users can also compare different versions of WebMaps of the same site to quickly discover changes on a site.
Figure I.61 NetCarta CyberPilot Pro (from NetCarta data sheet)NetCarta offers an enhanced version of CyberPilot called WebMapper as a tool for web administrators ("webmasters"). In combination with Lycos WebMapper allows to track links that point from other locations to the local site, thus enabling web authors at other sites to fix stale or "dangling" links.
The mapping chapter has introduced two types of systems:
What all of these systems have in common is an emphasis on the existing link structure. While we agree that this is the most distinguishing feature of a hypertext document, we nevertheless are convinced that this is only part of the picture. The systems introduced in the chapter about similarity as well as our own CYBERMAP system to be presented later address this other aspect of trying to generate information structure from contents.
- Novel representations for (hierarchical and non-hierarchical) information such as fish eye views, Cone Trees, Perspective Walls, and 3D hyperbolic space.
- Systems for graphically representing links as some form of directed acyclic graph (DAG), eventually using techniques from above, such as Intermedia's Web View, SOUR tools, maps of hyperspace, Navigational View Builder, CyberPilot and WebMapper.