Part 2: Elasticsearch - Search and Analytics Engine

My First Elasticsearch Query

I'll never forget the moment Elasticsearch clicked for me. I had just migrated a month's worth of application logs - 50 million documents. Using traditional grep on log files, finding a specific error took 10+ minutes.

In Elasticsearch, I typed:

GET /logs-*/_search
{
  "query": {
    "match": { "error.message": "payment timeout" }
  }
}

Response time: 47 milliseconds. Across 50 million documents. That's when I realized the power of Elasticsearch.

In this article, I'll share everything I learned about Elasticsearch - from installation to advanced querying, based on real projects and actual production usage.

What is Elasticsearch Really?

Beyond the marketing buzzwords, here's what I understand Elasticsearch to be:

Elasticsearch is a distributed search and analytics engine built on Apache Lucene. Think of it as:

A NoSQL document database (stores JSON)
A full-text search engine (like Google for your data)
An analytics platform (aggregations, statistics)
A distributed system (scales horizontally)

Written in Java, runs on the JVM, and exposes everything via RESTful HTTP APIs.

Installing Elasticsearch

I've installed Elasticsearch many ways. Here are the approaches I use:

Method 1: Docker (My Favorite for Development)

Single node for testing:

docker run -d \
  --name elasticsearch \
  -p 9200:9200 \
  -p 9300:9300 \
  -e "discovery.type=single-node" \
  -e "ES_JAVA_OPTS=-Xms512m -Xmx512m" \
  -e "xpack.security.enabled=false" \
  docker.elastic.co/elasticsearch/elasticsearch:8.11.0

Verify it's running:

curl http://localhost:9200

# Response:
{
  "name" : "elasticsearch",
  "cluster_name" : "docker-cluster",
  "version" : {
    "number" : "8.11.0",
    "lucene_version" : "9.8.0"
  },
  "tagline" : "You Know, for Search"
}

That tagline never gets old.

Method 2: Docker Compose (Multi-Node Development)

docker-compose.yml:

version: '3.8'

services:
  es01:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.11.0
    container_name: es01
    environment:
      - node.name=es01
      - cluster.name=es-cluster
      - discovery.seed_hosts=es02,es03
      - cluster.initial_master_nodes=es01,es02,es03
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
      - xpack.security.enabled=false
    ports:
      - 9200:9200
    volumes:
      - es-data01:/usr/share/elasticsearch/data

  es02:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.11.0
    container_name: es02
    environment:
      - node.name=es02
      - cluster.name=es-cluster
      - discovery.seed_hosts=es01,es03
      - cluster.initial_master_nodes=es01,es02,es03
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
      - xpack.security.enabled=false
    volumes:
      - es-data02:/usr/share/elasticsearch/data

  es03:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.11.0
    container_name: es03
    environment:
      - node.name=es03
      - cluster.name=es-cluster
      - discovery.seed_hosts=es01,es02
      - cluster.initial_master_nodes=es01,es02,es03
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
      - xpack.security.enabled=false
    volumes:
      - es-data03:/usr/share/elasticsearch/data

volumes:
  es-data01:
  es-data02:
  es-data03:

Start cluster:

docker-compose up -d

Check cluster health:

curl http://localhost:9200/_cluster/health?pretty

# Response:
{
  "cluster_name" : "es-cluster",
  "status" : "green",  # Green = healthy!
  "number_of_nodes" : 3,
  "active_primary_shards" : 0,
  "active_shards" : 0
}

Method 3: Linux Installation (Production)

On Ubuntu/Debian:

# Import Elasticsearch GPG key
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elasticsearch-keyring.gpg

# Add repository
echo "deb [signed-by=/usr/share/keyrings/elasticsearch-keyring.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list

# Install
sudo apt-get update
sudo apt-get install elasticsearch

# Start service
sudo systemctl start elasticsearch
sudo systemctl enable elasticsearch

Configuration file: /etc/elasticsearch/elasticsearch.yml

Core Elasticsearch Concepts

Let me explain the concepts that took me a while to grasp.

1. Documents and Indices

Document = A single JSON record (like a row in a database)

{
  "_index": "logs-2025-01-15",
  "_id": "abc123",
  "_source": {
    "timestamp": "2025-01-15T10:30:00Z",
    "level": "ERROR",
    "service": "payment-service",
    "message": "Payment gateway timeout",
    "user_id": "12345",
    "response_time": 5023
  }
}

Index = Collection of similar documents (like a table)

An index is where documents live. I create time-based indices:

logs-2025-01-15  (today's logs)
logs-2025-01-14  (yesterday's logs)
logs-2025-01-13  (day before)

2. Mappings (Schema)

Mapping defines the structure of documents - field types, analyzers, etc.

My typical log mapping:

PUT /logs-2025-01-15
{
  "mappings": {
    "properties": {
      "timestamp": {
        "type": "date",
        "format": "strict_date_optional_time||epoch_millis"
      },
      "level": {
        "type": "keyword"
      },
      "service": {
        "type": "keyword"
      },
      "message": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "user_id": {
        "type": "keyword"
      },
      "response_time": {
        "type": "integer"
      },
      "tags": {
        "type": "keyword"
      }
    }
  }
}

Field types I use:

keyword: Exact match, aggregations, sorting (log level, user ID)
text: Full-text search, analyzed (error messages)
date: Timestamps, date range queries
integer/long: Numbers (response time, counts)
boolean: True/false flags
ip: IP addresses (with range search support)
geo_point: Geographic coordinates

Key lesson: Choose field types carefully. Can't aggregate on text fields, can't full-text search keyword fields.

3. Shards and Replicas

Shard = Subset of an index, allows horizontal scaling

An index is divided into shards. Each shard is a self-contained Lucene index.

Example: Index with 3 primary shards

Index: logs-2025-01-15
├── Shard 0 (33% of data)
├── Shard 1 (33% of data)
└── Shard 2 (33% of data)

Replica = Backup copy of a shard

Primary Shards:
├── Shard 0 (on Node 1)
├── Shard 1 (on Node 2)
└── Shard 2 (on Node 3)

Replica Shards:
├── Shard 0 replica (on Node 2)
├── Shard 1 replica (on Node 3)
└── Shard 2 replica (on Node 1)

My shard strategy:

For small indices (< 10GB): 1 primary shard For medium indices (10-100GB): 3-5 primary shards For large indices (> 100GB): 5-10 primary shards

Replicas: Always 1+ replica in production (for redundancy)

Setting shards and replicas:

PUT /logs-2025-01-15
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1
  }
}

Important: Can't change primary shard count after index creation. Choose wisely.

4. Nodes and Clusters

Node = Single Elasticsearch instance (one Java process)

Cluster = Group of nodes working together

Node types I configure:

Master node: Manages cluster state, creates/deletes indices

node.roles: [ master ]

Data node: Stores data, executes queries

node.roles: [ data ]

Coordinating node: Routes requests, merges results (no data, not master)

node.roles: [ ]

My typical 5-node cluster:

3 master-eligible nodes (HA for cluster state)
5 data nodes (distribute data)
2 coordinating nodes (dedicated query routers)

Indexing Data

Let me show you the different ways I index data into Elasticsearch.

Method 1: Single Document via REST API

# Index a document with auto-generated ID
POST /logs-2025-01-15/_doc
{
  "timestamp": "2025-01-15T10:30:00Z",
  "level": "ERROR",
  "message": "Connection timeout"
}

# Index with specific ID
PUT /logs-2025-01-15/_doc/abc123
{
  "timestamp": "2025-01-15T10:30:00Z",
  "level": "ERROR",
  "message": "Connection timeout"
}

Method 2: Bulk API (High Throughput)

For indexing many documents efficiently:

POST /_bulk
{"index":{"_index":"logs-2025-01-15"}}
{"timestamp":"2025-01-15T10:30:00Z","level":"ERROR","message":"Error 1"}
{"index":{"_index":"logs-2025-01-15"}}
{"timestamp":"2025-01-15T10:30:01Z","level":"WARN","message":"Warning 1"}
{"index":{"_index":"logs-2025-01-15"}}
{"timestamp":"2025-01-15T10:30:02Z","level":"INFO","message":"Info 1"}

Performance: Can index 10,000+ documents per second per node.

My bulk indexing script (Python):

from elasticsearch import Elasticsearch, helpers

es = Elasticsearch(["http://localhost:9200"])

def generate_docs():
    for i in range(10000):
        yield {
            "_index": "logs-2025-01-15",
            "_source": {
                "timestamp": "2025-01-15T10:30:00Z",
                "level": "INFO",
                "message": f"Log message {i}",
                "count": i
            }
        }

# Bulk index
helpers.bulk(es, generate_docs())

Method 3: Via Logstash or Beats

This is how I actually do it in production - covered in Parts 3 and later.

Searching Data

Here's where Elasticsearch shines. Let me show you query patterns I use daily.

Query Syntax Options

1. URI Search (Quick and Dirty)

GET /logs-*/_search?q=level:ERROR
GET /logs-*/_search?q=message:"payment timeout"
GET /logs-*/_search?q=level:ERROR AND service:payment-service

2. Query DSL (Powerful, My Preference)

GET /logs-*/_search
{
  "query": {
    "match": {
      "message": "payment timeout"
    }
  }
}

3. Kibana Query Language (KQL) in Kibana UI

level: ERROR and service: "payment-service"

Common Query Types

Match Query (Full-Text Search)

Searches analyzed text fields:

GET /logs-*/_search
{
  "query": {
    "match": {
      "message": "database connection failed"
    }
  }
}

Finds documents containing "database", "connection", or "failed" (OR by default).

Term Query (Exact Match)

For keyword fields:

GET /logs-*/_search
{
  "query": {
    "term": {
      "level": "ERROR"
    }
  }
}

Range Query

For dates, numbers:

GET /logs-*/_search
{
  "query": {
    "range": {
      "timestamp": {
        "gte": "2025-01-15T00:00:00Z",
        "lte": "2025-01-15T23:59:59Z"
      }
    }
  }
}

{
  "query": {
    "range": {
      "response_time": {
        "gte": 1000
      }
    }
  }
}

Bool Query (Combine Multiple Conditions)

My most-used query type:

GET /logs-*/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "service": "payment-service" }}
      ],
      "filter": [
        { "term": { "level": "ERROR" }},
        { "range": { "timestamp": { "gte": "now-1h" }}}
      ],
      "must_not": [
        { "match": { "message": "expected error" }}
      ],
      "should": [
        { "match": { "tags": "critical" }}
      ],
      "minimum_should_match": 0
    }
  }
}

Breakdown:

must: Document MUST match (affects scoring)
filter: Document MUST match (no scoring, faster, cacheable)
must_not: Document MUST NOT match
should: Document SHOULD match (increases score if it does)

Use filter for exact matches, must for full-text search.

Practical Search Examples

Example 1: Find Errors in Last Hour

GET /logs-*/_search
{
  "query": {
    "bool": {
      "filter": [
        { "term": { "level": "ERROR" }},
        { "range": { "timestamp": { "gte": "now-1h" }}}
      ]
    }
  },
  "sort": [
    { "timestamp": { "order": "desc" }}
  ],
  "size": 100
}

Example 2: Slow API Requests

GET /logs-*/_search
{
  "query": {
    "bool": {
      "filter": [
        { "term": { "log_type": "access" }},
        { "range": { "response_time": { "gte": 1000 }}}
      ]
    }
  },
  "sort": [
    { "response_time": { "order": "desc" }}
  ]
}

Example 3: Search Across Multiple Fields

GET /logs-*/_search
{
  "query": {
    "multi_match": {
      "query": "timeout",
      "fields": ["message", "error.message", "error.stack_trace"]
    }
  }
}

Example 4: Wildcard and Regex

GET /logs-*/_search
{
  "query": {
    "wildcard": {
      "service": "payment-*"
    }
  }
}

GET /logs-*/_search
{
  "query": {
    "regexp": {
      "user_id": "[0-9]{5}"
    }
  }
}

Warning: Wildcards and regex can be slow. Use sparingly.

Aggregations (Analytics)

Aggregations are how I generate statistics, metrics, and insights.

Metric Aggregations

Count of Documents

GET /logs-*/_search
{
  "size": 0,
  "aggs": {
    "total_logs": {
      "value_count": {
        "field": "timestamp"
      }
    }
  }
}

Average, Min, Max, Sum

GET /logs-*/_search
{
  "size": 0,
  "aggs": {
    "avg_response_time": {
      "avg": { "field": "response_time" }
    },
    "max_response_time": {
      "max": { "field": "response_time" }
    },
    "total_requests": {
      "sum": { "field": "request_count" }
    }
  }
}

Percentiles

GET /logs-*/_search
{
  "size": 0,
  "aggs": {
    "response_time_percentiles": {
      "percentiles": {
        "field": "response_time",
        "percents": [50, 95, 99]
      }
    }
  }
}

Bucket Aggregations

Terms Aggregation (Group By)

Count logs by level:

GET /logs-*/_search
{
  "size": 0,
  "aggs": {
    "by_level": {
      "terms": {
        "field": "level",
        "size": 10
      }
    }
  }
}

Response:

{
  "aggregations": {
    "by_level": {
      "buckets": [
        { "key": "INFO", "doc_count": 45000 },
        { "key": "ERROR", "doc_count": 3500 },
        { "key": "WARN", "doc_count": 1500 }
      ]
    }
  }
}

Date Histogram (Time Series)

Logs per hour:

GET /logs-*/_search
{
  "size": 0,
  "aggs": {
    "logs_over_time": {
      "date_histogram": {
        "field": "timestamp",
        "calendar_interval": "hour"
      }
    }
  }
}

Range Aggregation

Group response times into buckets:

GET /logs-*/_search
{
  "size": 0,
  "aggs": {
    "response_time_ranges": {
      "range": {
        "field": "response_time",
        "ranges": [
          { "to": 100 },
          { "from": 100, "to": 500 },
          { "from": 500, "to": 1000 },
          { "from": 1000 }
        ]
      }
    }
  }
}

Nested Aggregations

Errors by service, then by hour:

GET /logs-*/_search
{
  "size": 0,
  "query": {
    "term": { "level": "ERROR" }
  },
  "aggs": {
    "by_service": {
      "terms": {
        "field": "service"
      },
      "aggs": {
        "over_time": {
          "date_histogram": {
            "field": "timestamp",
            "calendar_interval": "hour"
          }
        }
      }
    }
  }
}

This is how I build dashboards - nested aggregations for multi-dimensional analysis.

Index Templates

Index templates automatically apply settings and mappings to new indices.

My logs template:

PUT /_index_template/logs-template
{
  "index_patterns": ["logs-*"],
  "template": {
    "settings": {
      "number_of_shards": 3,
      "number_of_replicas": 1,
      "index.lifecycle.name": "logs-policy"
    },
    "mappings": {
      "properties": {
        "timestamp": {
          "type": "date"
        },
        "level": {
          "type": "keyword"
        },
        "service": {
          "type": "keyword"
        },
        "message": {
          "type": "text"
        },
        "user_id": {
          "type": "keyword"
        },
        "response_time": {
          "type": "integer"
        }
      }
    }
  },
  "priority": 100,
  "version": 1
}

Now every index matching logs-* gets these settings automatically.

Index Lifecycle Management (ILM)

ILM automates index lifecycle - from creation to deletion.

My logs ILM policy:

PUT /_ilm/policy/logs-policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_age": "1d",
            "max_size": "50gb"
          }
        }
      },
      "delete": {
        "min_age": "30d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

What this does:

Hot phase: Keep actively writing until index is 1 day old or 50GB
Delete phase: Delete indices older than 30 days

Saves storage, maintains performance.

Performance Optimization

Lessons I learned the hard way.

1. Use Filter Context When Possible

Slow (scoring overhead):

{
  "query": {
    "bool": {
      "must": [
        { "term": { "level": "ERROR" }}
      ]
    }
  }
}

Fast (no scoring, cacheable):

{
  "query": {
    "bool": {
      "filter": [
        { "term": { "level": "ERROR" }}
      ]
    }
  }
}

2. Limit Result Size

Don't do this:

GET /logs-*/_search
{
  "size": 10000
}

Do this:

GET /logs-*/_search
{
  "size": 100,
  "from": 0
}

For large result sets, use scroll API or search_after.

3. Use Index Patterns Wisely

Slow (searches all indices):

GET /_all/_search

Fast (searches specific date range):

GET /logs-2025-01-*/_search

4. Bulk Indexing Best Practices

Optimal bulk size: 5-15 MB per request Parallel bulk requests: 2-4 per node Refresh interval: Increase during bulk indexing

PUT /logs-*/_settings
{
  "index": {
    "refresh_interval": "30s"
  }
}

After bulk indexing, reset to default:

PUT /logs-*/_settings
{
  "index": {
    "refresh_interval": "1s"
  }
}

5. Mapping Optimization

Disable _source for metrics (if you don't need original document):

{
  "mappings": {
    "_source": {
      "enabled": false
    }
  }
}

Set ignore_above for long strings:

{
  "message": {
    "type": "keyword",
    "ignore_above": 256
  }
}

Useful Elasticsearch APIs

Cluster Health

GET /_cluster/health
GET /_cluster/health?level=indices

Node Stats

GET /_nodes/stats
GET /_nodes/stats/indices

Index Stats

GET /logs-*/_stats
GET /logs-2025-01-15/_stats

Cat APIs (Human-Readable)

GET /_cat/indices?v
GET /_cat/nodes?v
GET /_cat/shards?v
GET /_cat/health?v

Index Management

# Create index
PUT /my-index

# Delete index
DELETE /my-index

# Close index (free memory, keep data)
POST /my-index/_close

# Open index
POST /my-index/_open

# Refresh index
POST /my-index/_refresh

# Flush index
POST /my-index/_flush

Common Issues and Solutions

Issue 1: Unassigned Shards

Problem: Yellow/red cluster, shards not assigned

Check:

GET /_cluster/allocation/explain

Solution: Usually not enough nodes for replicas

PUT /my-index/_settings
{
  "number_of_replicas": 0
}

Issue 2: Slow Queries

Check slow logs:

GET /logs-*/_settings

Enable slow query logging:

PUT /logs-*/_settings
{
  "index.search.slowlog.threshold.query.warn": "10s",
  "index.search.slowlog.threshold.query.info": "5s"
}

Check logs: /var/log/elasticsearch/[cluster-name]_index_search_slowlog.log

Issue 3: Out of Memory

Check heap usage:

GET /_nodes/stats/jvm

Solution: Increase heap (up to 50% of RAM, max 32GB)

Edit /etc/elasticsearch/jvm.options:

-Xms4g
-Xmx4g

Issue 4: Disk Space

Check disk usage:

GET /_cat/allocation?v

Solution: Delete old indices, increase disk, or implement ILM

Conclusion

Elasticsearch is the core of the ELK stack - the engine that makes everything work. Key takeaways:

Core concepts:

Documents and indices (data structure)
Mappings (schema definition)
Shards and replicas (distribution and redundancy)
Nodes and clusters (scaling)

Key operations:

Indexing (single, bulk)
Searching (match, term, bool, range)
Aggregations (metrics, buckets, nested)
Index templates and ILM

Performance:

Use filter context
Limit result sizes
Optimize mappings
Bulk index efficiently

In the next article, we'll explore Logstash - the data processing pipeline that feeds Elasticsearch.

Previous: Part 1 - Introduction to ELK Stack Next: Part 3 - Logstash Pipeline

This article is part of the ELK Stack 101 series. Check out the series overview for more content.

PreviousPart 1: Introduction to ELK Stack NextPart 3: Logstash - Data Processing Pipeline

Last updated 2 days ago

hashtagMy First Elasticsearch Query

hashtagWhat is Elasticsearch Really?

hashtagInstalling Elasticsearch

hashtagMethod 1: Docker (My Favorite for Development)

hashtagMethod 2: Docker Compose (Multi-Node Development)

hashtagMethod 3: Linux Installation (Production)

hashtagCore Elasticsearch Concepts

hashtag1. Documents and Indices

hashtag2. Mappings (Schema)

hashtag3. Shards and Replicas

hashtag4. Nodes and Clusters

hashtagIndexing Data

hashtagMethod 1: Single Document via REST API

hashtagMethod 2: Bulk API (High Throughput)

hashtagMethod 3: Via Logstash or Beats

hashtagSearching Data

hashtagQuery Syntax Options

hashtagCommon Query Types

hashtagMatch Query (Full-Text Search)

hashtagTerm Query (Exact Match)

hashtagRange Query

hashtagBool Query (Combine Multiple Conditions)

hashtagPractical Search Examples

hashtagExample 1: Find Errors in Last Hour

hashtagExample 2: Slow API Requests

hashtagExample 3: Search Across Multiple Fields

hashtagExample 4: Wildcard and Regex

hashtagAggregations (Analytics)

hashtagMetric Aggregations

hashtagCount of Documents

hashtagAverage, Min, Max, Sum

hashtagPercentiles

hashtagBucket Aggregations

hashtagTerms Aggregation (Group By)

hashtagDate Histogram (Time Series)

hashtagRange Aggregation

hashtagNested Aggregations

hashtagIndex Templates

hashtagIndex Lifecycle Management (ILM)

hashtagPerformance Optimization

hashtag1. Use Filter Context When Possible

hashtag2. Limit Result Size

hashtag3. Use Index Patterns Wisely

hashtag4. Bulk Indexing Best Practices

hashtag5. Mapping Optimization

hashtagUseful Elasticsearch APIs

hashtagCluster Health

hashtagNode Stats

hashtagIndex Stats

hashtagCat APIs (Human-Readable)

hashtagIndex Management

hashtagCommon Issues and Solutions

hashtagIssue 1: Unassigned Shards

hashtagIssue 2: Slow Queries

hashtagIssue 3: Out of Memory

hashtagIssue 4: Disk Space

hashtagConclusion