GLOSSARY

TF.IDF

TF-IDF balances how often a word appears in a document with how rare it is across all documents. It highlights terms that best describe content.

What is TF-IDF?

TF-IDF (Term Frequency–Inverse Document Frequency) is a statistical measure of how important a word is in a document relative to a collection. It balances term frequency (TF) and inverse document frequency (IDF), boosting distinctive terms and discounting common ones.

How It Works (quick)

  • TF: Count of term in document (normalized).
  • IDF: log(N / df), where N = total docs, df = number of docs containing term.
  • Weight: TF × IDF = relevance score.
  • Effect: High when a term is frequent in one doc but rare across the corpus.

Why It Matters in E-commerce

  • Search ranking: Distinguishes “GORE-TEX” from generic terms like “shoe”.
  • SEO: Helps identify unique, descriptive keywords for content.
  • Catalogs: Improves matching between queries and PDPs.

Best Practices

  • Use TF-IDF as part of a hybrid retrieval model (with semantics, BM25).
  • Monitor common-term drift in product titles.
  • Apply field boosts (title > description > reviews).

Summary

TF-IDF highlights terms that are both frequent and distinctive. It remains a foundation for modern search and SEO keyword analysis.