What is a Search Database?
A search database is a datastore optimized for information retrieval rather than OLTP. It holds inverted indexes for text, doc values/columnar stores for filters/sorts, and often vector indexes for semantic retrieval—plus replicas and caches for low-latency reads.
How It Works (quick)
- Ingest: Connectors/feeds → clean & normalize → map to fields.
- Index structures:
- Inverted index for tokens/phrases/positions.
- Doc values for facets, ranges, and sorting (price, stock, rating).
- ANN/vector index for embeddings.
- Serve: Query planner selects structures → retrieve candidates → apply filters/ACL → score/re-rank → return hits.
- Scale: Shard for throughput; replicate for HA; snapshot for backups.
Why It Matters in E-commerce
- Speed: Sub-100 ms queries for large catalogs.
- Quality: Fielded data enables strong ranking and facets.
- Flexibility: Hybrid (BM25 + vectors) and merchandising rules without heavy joins.
Best Practices
- Schema hygiene: Separate exact (SKU/MPN), text (title/desc), attributes (typed), vectors.
- Locale analyzers: Per-market tokenization/lemmatization and stopwords.
- Freshness: Event-driven deltas for price/stock; monitor indexing lag.
- Ops: Right-size shards; tune merge/segment policies; autoscale read replicas.
- Observability: Dashboards for QPS, p95/99 latency, index size, merge backlog, cache hit rate.
- Resilience: Snapshots, restore drills, and versioned mappings.
Challenges
- Hot shards, mapping drift, segment bloat, uneven traffic, and embedding staleness.
Examples
- Product grid with price/rating sorting and in-stock filter under 100 ms.
- Hybrid retrieval: BM25 recall + vector recall → LTR re-rank for long queries.
Summary
A search database is the backbone of fast, relevant product discovery. Design clean fields, use locale analyzers, keep deltas flowing, and monitor latency and index health.