GLOSSARY

Fuzzy Search

Fuzzy search finds close matches when there are typos or small differences. In stores, it saves the day on messy queries—so shoppers still see good results.

What is Fuzzy Search?

Fuzzy search matches near-by strings rather than exact text, typically using edit distance (e.g., Levenshtein) or phonetic similarity. It tolerates insertions, deletions, substitutions, and transpositions so misspellings and spacing issues still retrieve results.

How It Works (quick)

  • Edit distance: Allow 1–2 edits depending on term length.
  • Token strategy: Apply fuzziness to unigrams; keep bigrams/phrases stricter.
  • Thresholds: Length-aware rules (e.g., no fuzz for 1–2-char tokens; 1 edit for 3–5; 2 edits for ≥6).
  • Costs & boosts: Weight edits (transposition cheaper), penalize low-quality matches.
  • Fallbacks: Use exact/phrase boosts first; relax progressively when recall is low.

Why It Matters in E-commerce

  • Real-world typos: Brand names, model numbers, accents/diacritics.
  • Mobile input: Fat-finger errors, missing spaces (“airmax”).
  • Locale variants: “Sneakers” vs “sneekers”; “gore tex” vs “gore-tex”.

Best Practices

  • Guardrails: Don’t apply fuzz to SKU/MPN fields; keep an exact field.
  • Length-aware fuzz: Scale edits with term length; disable for stopwords.
  • Synonyms + normalization: Diacritic/case folding; synonym maps for common variants (gtx ↔ GORE-TEX).
  • Precision first: Exact/phrase/bigram scores outrank fuzzy hits; cap fuzzy impact.
  • Analytics: Monitor “fuzzy saves” vs false positives; tune per locale.

Challenges

  • Over-recall: Irrelevant results from aggressive fuzziness.
  • Performance: Fuzzy matching is costlier; cache frequent queries.
  • Brand integrity: Don’t degrade protected brand casing/spacing.

Examples

  • “nik” → Nike (1 edit); “gore tex” → GORE-TEX (space + case).
  • “iphon 15” → iPhone 15 (substitution).
  • “airmax” → Air Max (missing space).

Summary

Fuzzy search rescues typo-ridden queries by allowing small edit distances—without sacrificing precision when you cap its influence, protect exact fields, and combine with synonyms and phrase/bigram logic.

FAQ

Fuzzy vs spell correction?

Spell correction rewrites the query; fuzzy search tolerates it at match time. Many stacks use both.

Apply fuzziness everywhere?

No—protect exact fields (SKU/MPN), keep phrases tight, and use bigrams for common two-word units.

How many edits are safe?

Usually ≤2; tune per language and field, and cap the scoring impact.