Search Queries Deep Dive

Series: Elasticsearch 101 | Article: 04


Overview

Elasticsearch has a query DSL (Domain Specific Language) built on JSON. Once you understand the structure, writing complex queries becomes systematic rather than guesswork. This article covers the queries I reach for most often and the critical difference between query context and filter context.

All examples run against the articles index from Article 03.


The Query Context vs Filter Context

This distinction matters for performance.

Query context β€” the query contributes to the relevance score (_score). Each document gets a floating-point relevance score based on how well it matches. Use query context when you want results ranked by relevance.

Filter context β€” the query does not compute a score. Elasticsearch only asks "does this document match yes/no?" Filters are significantly faster because:

  1. No score computation.

  2. Results are cached at the shard level.

The most important rule: always put non-scoring constraints in the filter clause of a bool query.

{
  "query": {
    "bool": {
      "must": [
        { "match": { "title": "elasticsearch" } }
      ],
      "filter": [
        { "term": { "is_published": true } },
        { "range": { "published_at": { "gte": "2025-01-01" } } }
      ]
    }
  }
}

In this query:

  • match on title runs in query context β€” contributes to _score.

  • term on is_published runs in filter context β€” cached, no score cost.

  • range on published_at runs in filter context β€” also cached.

This pattern is the most common structure in my production queries.


Full-Text Queries

match

The standard full-text query. It analyzes the search string with the same analyzer used to index the field.

"operator": "or" returns documents matching any of the terms. Change to "and" to require all terms to be present.

match_phrase

Matches the exact sequence of words in order, with adjacent positions:

Use for phrase-exact searches. With "slop": 1 you can allow one word between the terms.

multi_match

Search across multiple fields at once:

^2 boosts the title field's score contribution by a factor of 2. type: best_fields uses the highest-scoring field per document, which works well when you want the most relevant field to dominate.


Term-Level Queries

Term-level queries operate on exact values without analysis. Use them on keyword, integer, boolean, and date fields.

term

Exact match on a single value:

Never use term on a text field β€” the value will not match because the field has been analyzed (lowercased, tokenized). Use term on keyword fields only.

terms

Match any of a list of exact values:

range

Filter by numeric range or date range:

Date math is supported: "gte": "now-7d/d" matches documents from the last 7 days, rounded to the start of the day.

exists

Match documents where a field exists and is not null:


The bool Query

The bool query is the composition primitive β€” it combines other queries using four clauses:

Clause
Behavior

must

Document must match. Contributes to score.

should

Document may match. Contributes to score.

must_not

Document must NOT match. Runs in filter context (no score).

filter

Document must match. Runs in filter context (no score, cached).

Example: find published articles tagged with elasticsearch, ranked by relevance to the text query, from 2025:

bool queries can nest arbitrarily, which enables complex filtering logic. Keep the must clause for relevance scoring and move all filtering to filter.


Sorting

By default results are sorted by _score descending. Override with sort:

Sorting on text fields is not allowed β€” sort on keyword or numeric/date fields only. This is another reason to define title.keyword as a multi-field.


Highlighting

Return highlighted snippets of matching text:

The response includes a highlight object per document with <em> wrapped around matching tokens. Replace <em>/</em> with your own markup (React, HTML, etc.) in the frontend.


Pagination

from / size (Offset Pagination)

Simple but has a hard limit: from + size cannot exceed index.max_result_window (default 10,000). Beyond that, use search_after.

search_after (Keyset Pagination)

More efficient for deep pagination. Use the sort values of the last document from the previous page as the cursor:

Always include _id as a tiebreaker in the sort to ensure deterministic pagination. The frontend passes the sort values of the last visible document to fetch the next page.


Allow for minor typos with fuzziness:

"fuzziness": "AUTO" uses edit distance 0 for strings of 1–2 characters, 1 for 3–5 characters, and 2 for 6+. It handles most single-character typos without generating too many false positives.


Source Filtering

Avoid returning large body or _source fields on every hit when you only need metadata for the list view:


Checking a Query's Score Explanation

When a document is scored unexpectedly high or low, use explain:

The response breaks down exactly how _score was calculated (BM25 term frequency, inverse document frequency, field length normalization).


Summary

  • Put relevance-scoring queries in must; put filters in filter for caching and performance.

  • match / multi_match for full-text; term / terms / range for exact values in filter context.

  • Use bool as your composition primitive β€” nest it as deep as needed.

  • Sort on keyword or date/numeric fields; never on text.

  • For pagination beyond 10k results, use search_after with a sort tiebreaker on _id.

  • Source filter to limit response payload on list views.


Previous: Indexing Documents and Mappings | Next: Aggregations and Analytics

Last updated