1.4 The Vector Space Model

Contrary to the basic Boolean query model, the vector space model allows to find the documents which are the most similar to the query without the need for a 100 percent match. In the vector space model, both queries and documents are represented as term vectors of the form Di = (di1, di2, ...,dit) and Q = (q1, q2, ...,qt). A document collection is then represented as a term-document matrix A:

The similarity between a query vector Q and a document term vector D can then be computed as:

This method of computing similarity coefficients between queries and documents is particularly advantageous because it allows one to sort all documents in decreasing order of similarity to a particular query. This also permits one to adapt the size of the retrieved document set to the user's needs.