Cosine similarity
Cosine similarity is a simple way to measure how alike two non-zero vectors are by looking at the angle between them. It is calculated as the dot product of the vectors divided by the product of their lengths. Because it depends on direction rather than size, it ranges from -1 (opposite directions) to +1 (the same direction); 0 means the vectors are at right angles.
If all the vector components are nonnegative, the range is 0 to 1. This makes cosine similarity especially useful when comparing objects like text documents, where each document is represented by a vector of word counts or word frequencies. It tells you how similar two documents are in topic, regardless of their length.
Key ideas
- How it’s computed: cos(theta) = (A · B) / (||A|| ||B||), where A and B are vectors and ||A|| is the vector length.
- Why it’s useful: it focuses on the orientation of vectors, not their magnitudes. This is great for comparing frequency-based data (like word counts) where longer documents shouldn’t automatically look more similar.
- In text work: documents are often represented by term frequency or TF-IDF vectors, and cosine similarity measures how close their content is.
Related concepts
- Centered cosine similarity: subtracting the mean from each vector turns cosine similarity into the Pearson correlation, which measures linear relationship rather than direction alone.
- Distance view: if you want a distance instead of similarity, you can use distance = sqrt(2(1 − SC(A, B))). This converts the similarity into a true distance by relating it to Euclidean distance when vectors are normalized.
- Angular distance: the angle between vectors, theta = arccos(SC(A, B)). This can be used as a proper notion of distance on angles.
- Cosine as a non-metric: the standard cosine distance without normalization isn’t a true distance metric because it may violate the triangle inequality. Using the angular distance or Euclidean distance after unit normalization fixes this.
- Euclidean connection: when vectors are normalized to unit length, the Euclidean distance between them is closely linked to their cosine similarity: ||A − B||^2 = 2(1 − SC(A, B)).
Practical notes
- Positive data: with nonnegative components (common in text), cosine similarity behaves nicely and stays within 0 to 1.
- Soft cosine: a generalization that accounts for similarity between features themselves (not just independence). It uses a feature similarity matrix to weigh how related features are; it reduces to ordinary cosine similarity if all off-diagonal similarities are zero. It’s more computationally intensive (quadratic in the number of features) but can capture more nuanced similarities, such as between related words or phrases.
Other names and connections
- Also called Orchini similarity and Tucker coefficient of congruence.
- For binary data, cosine similarity matches the Otsuka–Ochiai coefficient (a formula used in biology and information retrieval).
In short, cosine similarity is a fast, robust way to compare how similar two vectors are by their direction, making it especially popular for text analysis and other sparse, high-dimensional data.
This page was last edited on 2 February 2026, at 14:28 (CET).