What is Elasticsearch and When to Use It
Series: Elasticsearch 101 | Article: 01
Background
Elasticsearch is a distributed search and analytics engine built on top of Apache Lucene. It was initially released in 2010, and what made it stand out from day one was not just the speed β it was the fact that you could scale it horizontally without changing your application code, and query it over a straightforward HTTP/JSON API.
I reached for Elasticsearch after hitting a wall with full-text search in a relational database. The query worked fine at 100k rows. At 2 million rows with fuzzy matching enabled, it was unusable. Elasticsearch solved that problem, but it also introduced a different mental model that is worth understanding before writing a single line of code.
How Elasticsearch Stores Data
The Inverted Index
Elasticsearch does not run LIKE '%keyword%' across rows. Instead, it builds an inverted index at write time.
When you index a document like this:
{
"title": "Getting started with distributed systems"
}Elasticsearch tokenizes the text and builds a map:
getting
[1]
started
[1]
distributed
[1]
systems
[1]
When you search for "distributed", Elasticsearch does a direct lookup in this map rather than scanning every document. This is why full-text search is fast regardless of how many documents you have β the cost is at write time, not read time.
Segments and Immutability
Elasticsearch writes data into segments β small, immutable Lucene indexes on disk. Once a segment is written, it is never modified. Updates are modeled as a delete of the old document plus a write of the new one. Background merge operations periodically combine small segments into larger ones.
This immutability is why Elasticsearch is excellent for append-heavy workloads and why you need to think differently about high-update-rate data.
Core Concepts
Cluster
A cluster is one or more nodes working together, identified by a cluster name. All nodes in a cluster share the same data and expose it as a single API endpoint.
Node
A node is a single running instance of Elasticsearch. A node can hold data, coordinate requests, or do both. In local development you typically run a single node. In production you run at least three to avoid split-brain scenarios.
Index
An index in Elasticsearch is roughly analogous to a table in a relational database. It has a name, a mapping (schema), and contains documents. Unlike a database table, an index is actually a logical grouping of one or more shards.
Document
A document is a JSON object stored in an index. Every document has:
A
_indexβ which index it belongs toAn
_idβ its unique identifier within the indexA
_sourceβ the original JSON body
Shard
A shard is a single Lucene index. Elasticsearch distributes shards across nodes. When you create an index, you configure how many primary shards it has (this cannot be changed later without reindexing). Each primary shard can have one or more replica shards for redundancy and read scalability.
A common starting configuration for a production index is 1 primary shard + 1 replica if the data fits one node, or 3β5 primary shards + 1 replica for larger datasets.
Mapping
A mapping defines the schema of documents in an index β what fields exist, their data types, and how they should be analyzed. Getting mappings right up front avoids painful reindexing jobs later.
Elasticsearch versus a Relational Database
Full-text search
Slow at scale, limited relevance scoring
Purpose-built, fast, relevance-ranked
Structured queries (join, aggregation)
Strong support with SQL
Aggregations are powerful but no JOIN
ACID transactions
Yes
No (document-level atomicity only)
Schema changes
Migrations, alter table
Add fields freely; type changes require reindex
Primary data store
Yes
Avoid β treat as a secondary derived store
Writes per second
High, highly concurrent
High for bulk; single-doc updates have overhead
The most important line in the table: treat Elasticsearch as a secondary derived store. I write to PostgreSQL first and sync to Elasticsearch asynchronously (via outbox pattern, CDC, or periodic job depending on the latency requirement). This keeps your source of truth clean and your search index optimized for reads.
When Elasticsearch Is the Right Tool
Full-Text Search with Relevance
If users need to search natural language text β blog posts, product descriptions, documentation β Elasticsearch is the right choice. It handles stemming, synonyms, multi-language analyzers, fuzziness, and relevance scoring out of the box.
Autocomplete and Typeahead
With the completion field type or edge_ngram tokenizer, autocomplete queries on large datasets are very fast. I have used this for a local knowledge base project and the response time was under 10ms at p99.
Log and Event Analytics
The ELK Stack (Elasticsearch + Logstash + Kibana) became the de facto standard for log aggregation for good reason. Data streams and index lifecycle management (ILM) make it practical to store and search months of log data.
Faceted Search
E-commerce style filtering β filter by category, price range, brand, rating β maps naturally to Elasticsearch aggregations combined with filter queries. The query runs over the full corpus, aggregations count the filtered subsets simultaneously, all in a single round trip.
Geospatial Search
Elasticsearch supports geo_point and geo_shape field types and can do distance-based filtering and sorting. For anything beyond basic PostGIS queries at scale, Elasticsearch is worth considering.
When NOT to Use Elasticsearch
Primary OLTP store: No multi-document transactions, no foreign key constraints. Use a relational database or document store for this.
High-frequency single-document updates: Each update is a delete + rewrite. If a record changes dozens of times per second, you will create segment churn. Batch updates or use a DB as the write-ahead store.
Heavy joins or relational data: Elasticsearch has no JOIN.
nestedandparent-childtypes exist but add complexity. If your query is fundamentally relational, stay in SQL.Small datasets: The operational overhead of Elasticsearch is not worth it if a properly indexed PostgreSQL table would serve the same purpose.
Data Model Mental Shift
The biggest adjustment coming from relational databases is denormalization. In SQL you normalize to avoid duplication and rely on JOINs. In Elasticsearch you intentionally duplicate data to avoid needing JOINs.
If you are indexing articles with authors, you embed the author's name, avatar URL, and bio directly in the article document. You accept that if the author updates their avatar, you need to update every article. This is a deliberate trade-off: denormalized writes β very fast reads.
Summary
Elasticsearch is a distributed search and analytics engine built on Lucene.
It uses inverted indexes to make full-text search fast regardless of dataset size.
Core units are: cluster β node β index β shard β document.
It excels at full-text search, faceted navigation, log analytics, and autocomplete.
Treat it as a secondary read-optimized store, not a primary database.
Mapping decisions made at index creation time are largely permanent β plan them.
Last updated