BM25 is a ranking formula that scores how well a document matches your search words. In online stores, it gives fast, strong first results you can refine with AI or business rules.
BM25 (often called Okapi BM25) is a classic information-retrieval ranking function that scores documents based on term frequency, inverse document frequency, and document length normalization. It’s lexical (keyword-based), fast, explainable, and still a state-of-the-art baseline for search.
Typical parameters: k1 ≈ 1.2–2.0, b ≈ 0.6–0.9. Tune per language and corpus.
BM25 is a fast, dependable ranking baseline for product search. Use it to retrieve a strong candidate set, then layer filters, business rules, and semantic re-ranking to deliver highly relevant results under tight latency.
BM25 vs TF-IDF?
BM25 improves on TF-IDF with length normalization and term saturation, giving more stable, practical scores.
BM25 vs vector/semantic search?
BM25 is lexical and very fast; vectors catch meaning beyond exact words. The strongest setups use hybrid pipelines.
What is BM25F?
A field-aware variant: compute BM25 per field and combine with field weights (e.g., title ×3, attributes ×2, description ×1).
How do I tune k1 and b?
Grid-search a few pairs per locale/index; evaluate with NDCG/MRR and business KPIs (CTR, conversion).
How to handle synonyms and typos?
Add query-time synonyms and typo tolerance (edit distance), and tune analyzers for each language.