# Indexing by Latent Semantic Analysis

**Latent semantic analysis** (LSA) is mathematical method for computer modeling and simulation of the meaning of words and passages by analysis of representative corpora of natural text. LSA closely approximates many aspects of human language learning and understanding. It supports a variety of applications in information retrieval, educational technology and other pattern recognition problems where complex wholes can be treated as additive functions of component parts.

### Overview of purpose and method

Latent Semantic Analysis (also called LSI, for Latent Semantic Indexing) models the contribution to natural language attributable to combination of words into coherent passages. It uses a long-known matrix-algebra method, Singular Value Decomposition (SVD), which became practical for application to such complex phenomena only after the advent of powerful digital computing machines and algorithms to exploit them in the late 1980s. To construct a *semantic space* for a language, LSA first casts a large representative text corpus into a rectangular matrix of words by coherent passages, each cell containing a transform of the number of times that a given word appears in a given passage. The matrix is then decomposed in such a way that every passage is represented as a vector whose value is the sum of vectors standing for its component words. Similarities between words and words, passages and words, and of passages to passages are then computed as dot products, cosines or other vector-algebraic metrics. (For *word* and *passage* in the above, any objects that can be considered parts that add to form a larger object may be substituted.)

