7.2 SWISH

http://www.eit.com/goodies/software/swish/swish.html

SWISH is another publicly available indexing systems for web pages. We introduce it here briefly because our own indexing mechanism described later in the CYBERMAP chapter is based on SWISH. SWISH stands for Simple Web Indexing System for Humans. It allows to index directories of files and search the generated indexes.

SWISH has explicitly been developed for Web use, which means that it knows about the popular HTML tags, and can either ignore data in tags or give higher relevance to information in header and title tags. Titles are extracted from HTML files and appear in the search results. SWISH allows also to search for words that exist in HTML titles, comments, and emphasized tags. The SWISH index consist of only one file that is about 1 to 5% of the size of the original HTML data.

The obvious disadvantage of the simplicity of SWISH is that it does not support advanced functions such as stemming (searching for different versions of a word) or to use synonyms.

The creator company of SWISH, Enterprise Integration Technologies Corporation (EIT) has not placed SWISH in the public domain, but distributes it free of royalties for personal, academic, research and internal commercial use.