GLOSSARY

Stop List

A stop list is a set of common words you de-emphasize or ignore. Use it carefully so you don’t drop meaningful terms.

Example H2

Example H3

Example H4

Example H5

Example H6

What is a Stop List?

A stop list (stopwords) is a configurable set of very frequent words (e.g., the, and, of) that contribute little to ranking. Engines either remove them or down-weight them during analysis and scoring.

How It Works (quick)

Language-specific sets: Different lists per locale; often smaller than you think.
Soft vs hard: Soft stopwords keep tokens but discount their weight; hard stopwords drop them.
Context: Keep stopwords for phrases/entities (e.g., “The North Face”, “Gift Card”).

Why It Matters in E-commerce

Speed & noise control: Fewer low-value tokens → faster, cleaner retrieval.
Precision: Prevents generic words from overpowering brand/model signals.

Best Practices

Prefer soft handling: Down-weight instead of deleting to preserve phrase/proximity.
Protect names: Whitelist stopword-like tokens in brands, titles, entities (e.g., “pro”, “max”, “one”).
Per-locale tuning: Separate lists for each language/market; review quarterly.
Analytics: Log removed/softened tokens; audit zero-result cases for over-filtering.
Query types: Disable aggressive stopwording for very short queries.

Challenges

Over-removal breaks phrases, harms exact brands, and impacts multilingual queries.

Examples

Keep “The” in “The North Face”; treat it as part of a phrase field.
Down-weight “for” in “case for iphone 14,” but still allow proximity matching.

‍

Summary

Stop lists reduce noise when applied softly and locally. Protect brand/phrase integrity, tune per locale, and monitor analytics so helpful tokens aren’t lost.

‍

FAQ

Stop list vs synonyms? Stopwords reduce weight; synonyms expand terms.

Should I stopword numbers? Usually not—sizes and model numbers matter.

‍