What is Query Transformation?
Query transformation is the set of rewrites and enrichments applied to a raw query before retrieval and ranking. It includes normalization, spell correction, synonym expansion, phrase/entity detection, unit/currency conversion, and operator parsing (e.g., “under €150” → price ≤ 150
).
How It Works (quick)
- Normalize: Lowercase, accent fold, trim punctuation; unify hyphen/space variants.
- Correct: Length-aware spelling/keyboard fixes (skip exact fields like SKU/brand).
- Expand (late-bound): Category- and locale-aware synonyms (trainers ↔ sneakers), abbreviations (GTX ↔ GORE-TEX).
- Understand: Extract entities/constraints (size, color, price, date), detect phrases, convert units/currency.
- Rewrite: Build a structured plan: filters/facets, keyword fields, phrase/bigram queries.
- Guardrails: Caps on expansions, allow undo/preview, log what fired for audits.
Why It Matters in E-commerce
- Higher recall, same precision: Recover results for typos and vocabulary gaps without flooding noise.
- Fewer steps: One natural query can apply multiple filters instantly.
- Localization: Handles sizes, units, and slang per market.
Best Practices
- Prefer late binding for synonyms and boosts; keep index lean.
- Protect exact fields (SKU/MPN/brand) from fuzz and aggressive rewrites.
- Use context (category, user locale) to choose expansions; avoid global rules.
- Render chips for extracted constraints so users can edit quickly.
- A/B test rewrite policies and monitor zero-results, CTR, reformulations.
- Version configs and keep explain logs for each transformation.
Challenges
- Over-expansion causing off-topic hits; multilingual ambiguity; latency from multiple steps; privacy with personalization features.
Examples
- “nike gore tex trail shoes size 45 under 150”
- → synonyms: gore tex → GORE-TEX; entities: size=45, price ≤ €150; filters + phrase field for trail shoes.
- “AB-1234”
- → bypass transforms; route to exact SKU field only.
- “laptop 16gb 512 ssd”
- → entities: RAM=16 GB, SSD=512 GB; map to attributes and phrase/bigram fields.
Summary
Query transformation turns messy input into a structured, context-aware query plan. Keep rewrites late-bound, safe, and explainable to boost recall and speed without sacrificing precision.