1. Short Introduction to Information Retrieval

Gerard Salton defines an information retrieval system in "Introduction to Modern Information Retrieval" [Sal83] as "...a system used to store items of information that need to be processed, searched, retrieved, and disseminated to various user populations".

Theoretically there is no constraint on the type and structure of the information items to be stored and retrieved with the information retrieval (IR) system. In practice, though, most large scale IR systems are still mostly processing textual information. If the information is particularly well structured, database management systems are used to store and access that information. The data in this case is normally structured in the form of networks (for network databases), hierarchies (for hierarchical databases), tables (for relational databases), or objects (for object oriented databases). Contrary to databases, classical information retrieval systems are concerned with storing and retrieving unstructured, or narrative information. To be searchable, information has to be stored in machine readable format. This means that, until recently, information retrieval systems were limited to searching textual information.

The advent of large, multimedia digital libraries has focused attention on retrieving documents consisting of multiple media types, including the traditional focus on textual sources and the increasing emphasis on media with spatial and temporal properties (e.g., sound, maps, graphics, images, video). Vibrant research in language processing, speech processing, image/video processing and spatial/temporal reasoning addresses these problems. Content based retrieval uses features of multimedia objects in their native form for indexing, storage and retrieval. Identifying features that make a difference is a non-trivial research challenge since different sets of features, from the same multimedia object, may be needed by different applications.

The information retrieval terminology is still based on fundamental techniques introduced in the eighties for text-based IR. Additionally, searching on the web is currently mostly text-based. We will therefore limit our discussion of IR here to a very brief introduction to text-based IR, while the reader is referred to the SIGIR [Sig95] and RIAO [Ria94] proceedings for an in-depth treatment of multimedia information retrieval.