What are Stop Words?
Stop words are frequent, low-value terms that add little to retrieval or ranking. They can be dropped (hard stopwords) or down-weighted (soft stopwords) depending on the analyzer configuration.
How It Works (quick)
- Lists: Language-specific sets; vary by market.
- Soft handling: Keep tokens but discount their score impact.
- Hard handling: Remove entirely—risky for brand/phrase integrity.
- Context: Protect when part of entities (e.g., The North Face, Of Mice and Men).
Why It Matters in E-commerce
- Efficiency: Smaller inverted index → faster queries.
- Precision: Prevents generic terms from overshadowing brand/spec fields.
- Risks: Dropping can harm exact brand/product names.
Best Practices
- Prefer soft stopwords over deletion.
- Maintain whitelists for names, SKUs, and brands.
- Tune lists per locale/language; revisit quarterly.
- Track zero-results caused by stopword handling.
- Disable aggressive stopwording for short queries.
Summary
Stop words keep search lean but can harm results if applied bluntly. Use soft weighting, protect names, and localize lists to balance speed and accuracy.