Bigram matching looks for pairs of adjacent words (like “running shoes”) instead of single keywords. In e-commerce, it boosts precision and intent understanding for compound terms, brand+model names, and common phrases.
Bigram matching is a retrieval technique that indexes and matches two-word sequences (word bigrams or “shingles”) from queries and documents. It sits between simple unigram (single-word) matching and full phrase search—capturing word order and short phrases without requiring an exact, quoted match.
wireless noise
, noise cancelling
, cancelling headphones
.title_unigram
, title_bigram
(and likewise for key attributes); boost bigrams in titles/attributes more than descriptions.air max
and max 270
in titles.Bigram matching captures short phrases and word order, lifting precision on high-value compound terms while keeping recall via unigrams. In modern product search, it’s a low-latency way to improve relevance before semantic retrieval and re-ranking step in.
Bigram vs phrase search?
Phrase search requires exact adjacency (often quoted). Bigrams reward adjacency but don’t require a strict phrase match.
Do I also need trigrams?
Use sparingly (brand+model+number). Trigrams add cost; bigrams + unigrams + re-ranking are usually enough.
Will bigrams hurt recall?
Not if you combine with unigrams and keep fallbacks; bigrams sharpen precision while unigrams preserve recall.
How do I add bigrams to BM25?
Index a separate bigram field with a 2-gram analyzer and include it in BM25F with an appropriate boost.
What about multilingual stores?
Use locale-specific analyzers and maintain bigram synonym maps per language.