GLOSSARY

Index

An index is the data structure that makes search fast. In stores, it organizes product text and attributes so results appear instantly.

What is an Index?

An index is the structured representation of your content used by the search engine to retrieve and rank results quickly. It typically includes an inverted index for text, field stores/doc values for attributes and sorting, and optionally a vector index for semantic retrieval.

How It Works (quick)

  • Ingest: Parse products/content → clean/normalize → map to fields.
  • Build structures:
    • Inverted index (term → postings of documents/positions).
    • Doc values/columnar for sorting/aggregations (price, rating, stock).
    • ANN/vector index for embeddings (semantic search).
  • Serve: Retrieve candidates (lexical/vectors) → apply filters/ACLs → score/re-rank → render hits.

Why It Matters in E-commerce

  • Speed & scale: Millisecond retrieval across large catalogs.
  • Quality: Proper fields enable better ranking, filters, and snippets.
  • Flexibility: Supports hybrid search (BM25 + vectors) and merchandising rules.

Best Practices

  • Field design: Separate title/attributes/description; keyword fields for SKU/MPN.
  • Analysis: Locale-aware tokenization, stopwords, stemming/lemmatization; handle diacritics.
  • Freshness: Delta updates for price/stock; tombstones for deletes.
  • Governance: Version mappings; keep reindex playbooks and backfills.
  • Observability: Track index size, segment count, merge times, and lag.

Challenges

  • Mapping drift; large segments; slow merges; schema changes requiring reindex; multi-locale analyzer consistency.

Examples

  • Add a bigram field to titles for better phrase intent.
  • Store size-in-stock as doc values for fast filtering/sorting.
  • Build a vector index for semantic recall before re-ranking.

Summary

The index is your search engine’s backbone. With clean fields, locale-aware analysis, fast deltas, and optional vectors, it delivers relevant results at storefront speed.

FAQ

Index vs database table?

An index is optimized for search & ranking, not OLTP transactions. You’ll still keep a source of truth (PIM/CMS).

Do I need both lexical and vector indexes?

For best results, yes: lexical for precision and speed; vectors for meaning.

When to reindex?

On mapping changes, tokenizer upgrades, or major taxonomy shifts.