GLOSSARY

Cross-Language Search

Cross-language search lets you search in one language and find results in another. Stores use it to match multilingual catalogs and customers without duplicating content.

What is Cross-Language Search?

Cross-language search (CLIR) retrieves relevant results across different languages—e.g., a query in Spanish returning English product pages. It relies on query/document translation, multilingual embeddings, or both, to compare meaning across languages.

How It Works (quick)

  • Approaches:
    • Query translation: Translate the query to the index language(s) → search → merge results.
    • Document translation: Translate content into the query language(s) offline; index both.
    • Multilingual vectors: Use shared-space embeddings (mBERT, LaBSE, multilingual MiniLM) for language-agnostic retrieval.
  • Hybrid retrieval: BM25 recall per language + vector recall in shared space → re-rank.
  • Normalization: Handle locale fields (currency, units, diacritics) and brand variants.
  • Presentation: Show results in the user’s language when available; otherwise display the original with a translation badge.

Why it Matters in E-commerce

  • Global catalogs: One product page can satisfy multiple markets.
  • Long-tail recall: Captures intent even when vocabulary differs.
  • Ops efficiency: Avoids cloning pages solely for search coverage.

Best Practices

  • Quality gate: Prefer human-reviewed translations for titles; machine translate long tails with QA.
  • Language tags: Store per-field language codes; index by locale.
  • Hard filters first: Respect stock/region ACLs before multilingual retrieval.
  • Re-ranking: Combine lexical and vector scores; demote low-confidence translations.
  • Metrics by locale: Track CTR/conv and reformulations per language pair.

Challenges

  • Ambiguity & brand names: Keep brand lexicons un-translated; handle homonyms carefully.
  • MT noise & latency: Caching and offline translation batches help.
  • Evaluation: Build bilingual test sets with human judgments.

Examples

  • Query “zapatillas trail impermeables 45” returns English PDPs with GORE-TEX trail shoes; UI offers on-the-fly translation.
  • Query “bon pour cadeau” maps to “gift card” page without duplicating content.

Summary

Cross-language search bridges languages with translation and multilingual embeddings so shoppers find the right items regardless of query language—while respecting stock, region, and brand integrity.

FAQ

Cross-language vs multilingual search?

Multilingual search serves content in many languages; cross-language specifically retrieves across languages.

Do I need vectors?

Vectors make meaning comparable across languages; still keep lexical recall and brand dictionaries.

How do I avoid bad translations?

Use curated dictionaries (brands, materials) and confidence thresholds; expose the original text when uncertain.