Aggregations and Analytics

Series: Elasticsearch 101 | Article: 05


Overview

Aggregations are Elasticsearch's answer to SQL GROUP BY, COUNT, SUM, AVG, and histogram queries. Unlike those SQL equivalents, aggregations run in the same request as a search query — you can simultaneously retrieve the top 10 matching documents and build the facet counts for every filter dimension without a second round trip.

This article covers the aggregation types I use most often and how to combine them with queries.


Aggregation Structure

Aggregations live under the top-level aggs key (alias: aggregations). Each aggregation has a user-defined name and a type:

{
  "query": { ... },
  "aggs": {
    "<agg-name>": {
      "<agg-type>": { ... }
    }
  }
}

Aggregations fall into three families:

Family
Purpose

Bucket

Group documents into buckets (categories, ranges, date histograms)

Metric

Compute numeric values over documents in a bucket (avg, sum, min, max)

Pipeline

Compute values from the output of other aggregations


Metric Aggregations

Metric aggregations compute a single value from the documents in scope.

value_count

Count documents that have a non-null value for a field:

Setting "size": 0 tells Elasticsearch not to return document hits — we only want the aggregation result.

avg, sum, min, max

stats

Returns count, min, max, avg, and sum in one aggregation:


Bucket Aggregations

Bucket aggregations group documents into buckets. Metrics can nest inside buckets.

terms

Group by distinct values of a keyword field — the most common aggregation for facets:

Response:

size controls how many buckets are returned (not document count). The default is 10. For tag clouds or full category lists, you may need size: 100 or more — but large terms aggregations are expensive, so size them deliberately.

date_histogram

Group by time intervals — useful for activity charts:

calendar_interval options: minute, hour, day, week, month, quarter, year. For fixed intervals (every 7 days), use fixed_interval: "7d" instead.

min_doc_count: 0 includes months with zero articles — useful for complete time series in charts.

range

Group documents into explicit numeric ranges:

filter aggregation

Compute a metric over a specific subset of documents, regardless of the main query:


Nested Aggregations

Aggregations nest by placing an aggs key inside a bucket aggregation. This is how you build "top tags by average view count":

Response shape:


Combining Search + Aggregations

This is the core pattern for faceted search. A single request returns:

  • Matching documents (for display)

  • Facet counts (for the sidebar filters)

  • Any analytics you need

One request, three pieces of data. No database joins, no sequential queries.


Global Aggregation

By default, aggregations are scoped to the documents matched by the query. Use global to aggregate over the entire index, regardless of the current query:

This gives you the average view count for the current filtered result set alongside the global average — useful for showing "this set vs all" comparisons.


Cardinality Aggregation

Count distinct values — the approximate equivalent of SELECT COUNT(DISTINCT field):

Cardinality is approximate (uses HyperLogLog++ internally). The error rate is configurable via precision_threshold — higher precision costs more memory. The default precision is sufficient for most UI use cases.


Pipeline Aggregations

Pipeline aggregations operate on the output of other aggregations rather than documents. A common use case is computing a moving average or cumulative sum over a date histogram:


Performance Notes

  • Aggregations on keyword fields are cached efficiently. Aggregations on text fields require fielddata: true in the mapping — avoid this, it uses heap memory per field.

  • Large terms aggregations with high cardinality (e.g., aggregating on user_id across millions of users) are expensive. Use cardinality for counts and limit bucket size.

  • Date histograms are generally cheap because dates parse predictably.

  • Run aggregation-heavy queries with "size": 0 to skip fetching and scoring documents when you only need the aggregation result.


Summary

  • Bucket aggregations group documents; metric aggregations compute values; pipelines compute across bucket results.

  • terms is the standard facet aggregation on keyword fields.

  • date_histogram powers time-series charts.

  • Nest metrics inside buckets to get "per-category stats" in a single query.

  • Combine search queries and aggregations in one request for faceted search.

  • Use "size": 0 when you only need aggregation results.


Previous: Search Queries Deep Dive | Next: Go Backend Integration

Last updated