What is a Relevance Score?
A relevance score is a composite metric assigned to each candidate result. It may come from BM25/field weights, a learning-to-rank model (e.g., LambdaMART), or a neural re-ranker, often normalized to a common scale (e.g., 0–1 or z-scores).
How It Works (quick)
- Inputs: Textual evidence (exact/phrase/proximity), semantic similarity, attribute matches, and business features (price, stock, rating, margin, recency, size-in-stock).
- Modeling: Linear blend or ML model; calibration to stabilize scores across categories/locales.
- Normalization: Min-max, logistic, or percentile scaling for consistent thresholds.
- Post-rules: Diversity/brand caps, tie-breakers (price, rating), and compliance filters.
- Explainability: Log top contributing features for “why this result.”
Why It Matters in E-commerce
- Consistent ordering across large catalogs and seasons.
- Operational control: Thresholds for no-show/“did you mean”, safe boosts for campaigns.
- Diagnostics: Score distributions reveal regressions or drift.
Best Practices
- Keep hard filters (ACL, region, OOS) outside the score—apply before ranking.
- Per-category calibration and feature caps to prevent domination by popularity.
- Use golden sets + A/B tests; monitor NDCG/CTR/conv and tail latency.
- Store score, features, and version for audits and rollback.
- Guard SKUs/brands with exact fields so scores reflect real intent.
Challenges
- Data leakage, position bias in labels, score drift across locales, and explainability for deep models.
Examples
- Score = 0.62 after BM25 + phrase boost + in-stock + rating feature; re-ranked above a 0.58 item with weaker availability.
- Campaign adds a +0.05 capped boost to “new arrivals” only within footwear.
Summary
A relevance score is your ordering signal. Build it from hybrid evidence, calibrate per category, enforce hard rules outside the score, and log contributions for trust and tuning.