10. Similarity

"What they (users) need is infoscopes that base search operations on similarity rather than matching. Similarity is domain dependent and subjective."

--Ramesh Jain [Jai95]

The tools and methods described in this chapter permit one to browse, edit and retrieve information based on similarities between parts of the document. One of the main problems of this approach is that a measure of similarity needs to be defined. Similarity is also called proximity, alikeness, affinity, or association in the literature [Jai88]. Described mathematically, a similarity between two objects i and j, denoted sim(i,j), must satisfy the following three properties:

Similarity properties