GLOSSARY

Proximity Searching

Proximity searching finds words that appear near each other, not just anywhere. In stores, it’s great for model names and phrases that can have small gaps or reorders.

What is Proximity Searching?

Proximity searching returns results where query terms occur within a certain distance (in order or unordered), using token positions and slop/distance parameters. It’s a flexible cousin of phrase matching that tolerates small gaps, synonyms, or minor reordering.

How It Works (quick)

  • Positions in the index: Store token positions/offsets per field to enable proximity queries.
  • Query types:
    • Exact phrase ("air max 270") → distance 0.
    • Proximity/NEAR (e.g., air NEAR/3 max) → allow ≤3 tokens between.
    • Unordered windows for languages with freer word order.
  • Scoring: Closer distances score higher; blend with BM25/field boosts; cap max gain.
  • Normalization: Handle hyphen/space variants (e.g., gore-tex vs gore tex) and accent folding.

Why It Matters in E-commerce

  • Model & attribute coherence: Keeps units like “air max,” “merino base layer,” “USB-C fast charger” intact even with adjectives in between.
  • Better snippets: Highlights show tight, meaningful spans, improving CTR.
  • Resilience: Handles extra words, punctuation, or locale word order differences.

Best Practices

  • Start tight: Small windows (NEAR/1–3) for titles; slightly looser for descriptions.
  • Mix with bigrams: Keep bigram fields for speed and ranking stability.
  • Protect exact fields: SKUs/MPNs remain exact—no fuzz/proximity.
  • Category-aware: Stronger proximity boosts in categories where phrases matter (sneakers, electronics).
  • Analytics: Log missed phrase pairs and false positives; tune per locale.

Challenges

  • Over-broad windows admit noise; inconsistent tokenization across languages; extra compute cost on large fields.

Examples

  • Query “gore tex running jacket” → matches GORE-TEX … jacket within 2–3 tokens.
  • Query “air 270 max” (misordered) → unordered proximity window still retrieves the right items.

Summary

Proximity searching keeps meaningful word groups together without being brittle. Use small windows, pair with bigrams, protect exact fields, and tune per category/locale for precision without losing recall.

FAQ

Proximity vs phrase matching?

Phrase requires strict order and adjacency (or tiny slop). Proximity allows a window and optional reordering.

Do I need positions if I have bigrams?

Yes—positions enable flexible windows; bigrams alone can miss valid variations.

What distances are safe?

Titles: 1–3; attributes: 1–4; descriptions: 3–6. Tune with evaluation.