GLOSSARY

Stop List

A stop list is a set of common words you de-emphasize or ignore. Use it carefully so you don’t drop meaningful terms.

What is a Stop List?

A stop list (stopwords) is a configurable set of very frequent words (e.g., the, and, of) that contribute little to ranking. Engines either remove them or down-weight them during analysis and scoring.

How It Works (quick)

  • Language-specific sets: Different lists per locale; often smaller than you think.
  • Soft vs hard: Soft stopwords keep tokens but discount their weight; hard stopwords drop them.
  • Context: Keep stopwords for phrases/entities (e.g., “The North Face”, “Gift Card”).

Why It Matters in E-commerce

  • Speed & noise control: Fewer low-value tokens → faster, cleaner retrieval.
  • Precision: Prevents generic words from overpowering brand/model signals.

Best Practices

  • Prefer soft handling: Down-weight instead of deleting to preserve phrase/proximity.
  • Protect names: Whitelist stopword-like tokens in brands, titles, entities (e.g., “pro”, “max”, “one”).
  • Per-locale tuning: Separate lists for each language/market; review quarterly.
  • Analytics: Log removed/softened tokens; audit zero-result cases for over-filtering.
  • Query types: Disable aggressive stopwording for very short queries.

Challenges

  • Over-removal breaks phrases, harms exact brands, and impacts multilingual queries.

Examples

  • Keep “The” in “The North Face”; treat it as part of a phrase field.
  • Down-weight “for” in “case for iphone 14,” but still allow proximity matching.

Summary

Stop lists reduce noise when applied softly and locally. Protect brand/phrase integrity, tune per locale, and monitor analytics so helpful tokens aren’t lost.

FAQ

Stop list vs synonyms? Stopwords reduce weight; synonyms expand terms.

Should I stopword numbers? Usually not—sizes and model numbers matter.