What is a Stop List?
A stop list (stopwords) is a configurable set of very frequent words (e.g., the, and, of) that contribute little to ranking. Engines either remove them or down-weight them during analysis and scoring.
How It Works (quick)
- Language-specific sets: Different lists per locale; often smaller than you think.
- Soft vs hard: Soft stopwords keep tokens but discount their weight; hard stopwords drop them.
- Context: Keep stopwords for phrases/entities (e.g., “The North Face”, “Gift Card”).
Why It Matters in E-commerce
- Speed & noise control: Fewer low-value tokens → faster, cleaner retrieval.
- Precision: Prevents generic words from overpowering brand/model signals.
Best Practices
- Prefer soft handling: Down-weight instead of deleting to preserve phrase/proximity.
- Protect names: Whitelist stopword-like tokens in brands, titles, entities (e.g., “pro”, “max”, “one”).
- Per-locale tuning: Separate lists for each language/market; review quarterly.
- Analytics: Log removed/softened tokens; audit zero-result cases for over-filtering.
- Query types: Disable aggressive stopwording for very short queries.
Challenges
- Over-removal breaks phrases, harms exact brands, and impacts multilingual queries.
Examples
- Keep “The” in “The North Face”; treat it as part of a phrase field.
- Down-weight “for” in “case for iphone 14,” but still allow proximity matching.
Summary
Stop lists reduce noise when applied softly and locally. Protect brand/phrase integrity, tune per locale, and monitor analytics so helpful tokens aren’t lost.