GLOSSARY

TF.IDF

TF-IDF balances how often a word appears in a document with how rare it is across all documents. It highlights terms that best describe content.

Example H2

Example H3

Example H4

Example H5

Example H6

What is TF-IDF?

TF-IDF (Term Frequency–Inverse Document Frequency) is a statistical measure of how important a word is in a document relative to a collection. It balances term frequency (TF) and inverse document frequency (IDF), boosting distinctive terms and discounting common ones.

How It Works (quick)

TF: Count of term in document (normalized).
IDF: log(N / df), where N = total docs, df = number of docs containing term.
Weight: TF × IDF = relevance score.
Effect: High when a term is frequent in one doc but rare across the corpus.

Why It Matters in E-commerce

Search ranking: Distinguishes “GORE-TEX” from generic terms like “shoe”.
SEO: Helps identify unique, descriptive keywords for content.
Catalogs: Improves matching between queries and PDPs.

Best Practices

Use TF-IDF as part of a hybrid retrieval model (with semantics, BM25).
Monitor common-term drift in product titles.
Apply field boosts (title > description > reviews).

Summary

TF-IDF highlights terms that are both frequent and distinctive. It remains a foundation for modern search and SEO keyword analysis.