GLOSSARY

Morphologic Analysis

Morphological analysis breaks words into roots and affixes and identifies their grammatical features. It helps search handle inflections, compounds, and agreement—especially in rich morphology languages.

What is Morphological Analysis?

Morphological analysis examines word structure—stems/lemmas, prefixes/suffixes, part of speech, case, number, tense—and (for some languages) compound splitting. It supplies signals for lemmatization, stemming, and language-aware indexing.

How It Works (quick)

  • Tokenization: Split text into tokens; detect script and language.
  • Morph parsing: Apply lexicons + rules or ML analyzers to derive lemma and morphological tags.
  • Compound handling: Split or annotate compounds (e.g., German; agglutinative languages like Hungarian/Finnish).
  • Fielding: Store lemma fields and phrase/bigram fields; keep exact keyword fields for brands/SKUs.
  • Serving: Use lemmas/tags to improve recall and reduce noise in lexical retrieval and extraction.

Why It Matters in E-commerce

  • Inflection robustness: Matches queries and titles across plural/case/tense forms.
  • Better extraction: More accurate brand/material/size detection from vendor copy.
  • Multilingual quality: Essential for languages with heavy inflection and compounding.

Best Practices

  • Prefer lemma-based analyzers for titles/descriptions; avoid aggressive stemming.
  • Protect brand, model, and SKU fields from morphological changes.
  • Localize analyzers per market; maintain compound rules where needed.
  • Evaluate with golden sets per language; track NDCG/MRR and error notes.
  • Cache analysis; avoid per-keystroke heavy parsing in autocomplete.

Challenges

  • OOV brands/models, mixed-language text, hyphen/compound edge cases, and POS ambiguity.

Examples

  • Query in Hungarian matches inflected product titles due to correct case/number handling.
  • German compound “Wanderschuhmembran” split to surface “Wander-schuh” + membrane intent.

Summary

Morphological analysis adds grammatical and structural insight so lexical search works across inflections and compounds. Use it to power lemma fields and extraction—while preserving exact brand/SKU handling.

FAQ

Morphological analysis vs lemmatization?

Lemmatization outputs the base form; morphological analysis also returns detailed grammatical features and often handles compounds.

Do vectors replace this?

No—morphology still boosts lexical precision/recall and extraction; combine with vectors in a hybrid pipeline.

Should I analyze every field?

No—skip exact/ID fields and treat attributes with care.