GLOSSARY

Keyword Search

Keyword search finds results by matching the words in your query. In stores, it’s fast and precise for titles, brands, and model codes—especially when paired with phrase/bigram fields and exact SKU handling.

What is Keyword Search?

Keyword search (lexical search) retrieves documents by matching tokens from the query to tokens in indexed fields. It relies on analyzers, TF-IDF/BM25 scoring, phrase/bigram logic, and optional fuzzy matching for minor typos.

How It Works (quick)

  • Analyze text: Tokenize, lowercase, fold accents; apply stemming/lemmatization per locale.
  • Index structures: Inverted index for tokens; BM25F to weight fields (title > attributes > description).
  • Query types: Unigram, phrase ("air max 270"), bigram fields, exact keyword fields (SKU/MPN), fuzzy tolerance for typos.
  • Scoring: Combine term frequency (TF), inverse document frequency (IDF), field boosts, and proximity.
  • Fallbacks: If strict matching fails, relax with fuzziness, synonyms, or broader fields.

Why It Matters in E-commerce

  • Speed & explainability: Millisecond recall with transparent highlights/snippets.
  • Precision: Great for brand+model, SKUs, and category terms.
  • Foundation for hybrid: Provides reliable candidates before vector/semantic steps and re-ranking.

Best Practices

  • Field design: Separate title, attributes, description; add keyword fields for SKUs.
  • Phrase/bigram support: Capture common two/three-word units (e.g., “air max”).
  • Synonyms (late binding): Apply at query time to avoid index bloat; localize per market.
  • Fuzzy guardrails: Disable fuzz on SKU/brand fields; length-aware edit limits elsewhere.
  • Stopwords & noise: Maintain per-locale lists; normalize hyphens/diacritics.
  • Analytics: Track zero-results, CTR@k, reformulations; tune by category/locale.

Challenges

  • Vocabulary gap: Shoppers say “trainers,” catalog says “sneakers.” (Use synonyms.)
  • Typos & spacing: Needs fuzzy logic without harming precision.
  • Multilingual complexity: Different analyzers and morphology per locale.
  • Over-boosting popularity: Can bury truly relevant but newer items (use caps and re-rankers).

Examples

  • "air max 270" → phrase match in title/attributes outranks generic “air” + “max”.
  • sku:"AB-1234" → exact field match.
  • gore tex jacket → normalized to match GORE-TEX titles.

Summary

Keyword search is the reliable backbone of storefront retrieval. With good analyzers, phrase/bigram fields, careful fuzziness, and synonym management, it delivers fast, precise results—and pairs perfectly with semantic re-ranking.

FAQ

Keyword vs semantic search?

Keyword matches words; semantic uses embeddings to match meaning. The best stacks are hybrid.

Do I still need BM25 with vectors?

Yes—lexical recall is fast, cheap, and precise; vectors add meaning, then re-rank.

Where do boosters fit?

After retrieval: apply custom ranking (stock, rating, margin) or learning-to-rank.