Inverse Document Frequency down-weights common words and up-weights rare ones. In stores, IDF helps rank products with distinctive terms (e.g., “GORE-TEX”, “merino”) above generic ones (“the”, “shoe”).
Inverse Document Frequency (IDF) measures how informative a term is across the collection. Terms that appear in many documents get a low IDF; rare terms get a high IDF. IDF is a core piece of TF-IDF and BM25 scoring.
Typical (smoothed) formula:
idf(t) = ln((N − df(t) + 0.5) / (df(t) + 0.5) + 1)
where N
is the number of documents and df(t)
is the number containing term t
.
IDF rewards informative terms and dampens generic ones, sharpening ranking for intent-rich queries—especially valuable in product search.
IDF vs TF? TF measures how often a term appears in one document; IDF measures how rare it is across documents.
IDF vs BM25? BM25 wraps TF with saturation and adds IDF (plus doc-length normalization).