GLOSSARY

Inverted File

An inverted file is the on-disk file/segment that stores term dictionaries and postings lists. For stores, it’s the physical structure that makes word lookups and filters fast.

What is an Inverted File?

An inverted file is a physical index segment on disk that contains the term dictionary, postings lists (doc IDs, term frequencies, positions), and related metadata/compression. It’s the storage unit the search engine reads to serve queries.

How It Works (quick)

  • Segments: New writes create segments; background merges consolidate them.
  • Core blocks:
    • Term dictionary → pointers to postings.
    • Postings lists → doc IDs + positions/offsets for phrases/highlights.
    • Skip/blocks & compression → faster jumps, smaller footprint (e.g., block codecs).
  • Integrity: Checksums/footers; versioning for safe upgrades.

Why It Matters in E-commerce

  • Latency: Well-compressed, block-skippable postings speed search and facets.
  • Scale: Efficient files keep storage and I/O in check for large catalogs.
  • Freshness: Balanced merge policy avoids query slowdowns during heavy updates.

Best Practices

  • Tune merge policy: Avoid many tiny segments; keep merges off peak hours.
  • Right fields in the right store: Positions for titles/attributes; doc values for price/rating/stock.
  • Compression: Modern codecs; store only what you query.
  • Monitoring: Track segment count, merge backlog, cache hit rate, I/O.
  • Backups: Snapshot segments and practice restores.

Challenges

  • Bloat from unused fields; hot merging under heavy ingest; schema drift across locales.

Examples

  • Keep long descriptions in stored fields; use postings with positions for titles.
  • Add a vector column in parallel files only if you run semantic recall.

Summary

The inverted file is how your logical inverted index lives on disk. Good merge/compression strategy and field hygiene deliver fast, reliable storefront search.

FAQ

Inverted file vs inverted index?

“Inverted index” is the logical structure; an “inverted file” is the on-disk segment that stores it.

Can I edit these files manually?

No—use engine APIs; manual edits risk corruption.