Search Queries Deep Dive
Series: Elasticsearch 101 | Article: 04
Overview
Elasticsearch has a query DSL (Domain Specific Language) built on JSON. Once you understand the structure, writing complex queries becomes systematic rather than guesswork. This article covers the queries I reach for most often and the critical difference between query context and filter context.
All examples run against the articles index from Article 03.
The Query Context vs Filter Context
This distinction matters for performance.
Query context β the query contributes to the relevance score (_score). Each document gets a floating-point relevance score based on how well it matches. Use query context when you want results ranked by relevance.
Filter context β the query does not compute a score. Elasticsearch only asks "does this document match yes/no?" Filters are significantly faster because:
No score computation.
Results are cached at the shard level.
The most important rule: always put non-scoring constraints in the filter clause of a bool query.
{
"query": {
"bool": {
"must": [
{ "match": { "title": "elasticsearch" } }
],
"filter": [
{ "term": { "is_published": true } },
{ "range": { "published_at": { "gte": "2025-01-01" } } }
]
}
}
}In this query:
matchontitleruns in query context β contributes to_score.termonis_publishedruns in filter context β cached, no score cost.rangeonpublished_atruns in filter context β also cached.
This pattern is the most common structure in my production queries.
Full-Text Queries
match
matchThe standard full-text query. It analyzes the search string with the same analyzer used to index the field.
"operator": "or" returns documents matching any of the terms. Change to "and" to require all terms to be present.
match_phrase
match_phraseMatches the exact sequence of words in order, with adjacent positions:
Use for phrase-exact searches. With "slop": 1 you can allow one word between the terms.
multi_match
multi_matchSearch across multiple fields at once:
^2 boosts the title field's score contribution by a factor of 2. type: best_fields uses the highest-scoring field per document, which works well when you want the most relevant field to dominate.
Term-Level Queries
Term-level queries operate on exact values without analysis. Use them on keyword, integer, boolean, and date fields.
term
termExact match on a single value:
Never use
termon atextfield β the value will not match because the field has been analyzed (lowercased, tokenized). Usetermonkeywordfields only.
terms
termsMatch any of a list of exact values:
range
rangeFilter by numeric range or date range:
Date math is supported: "gte": "now-7d/d" matches documents from the last 7 days, rounded to the start of the day.
exists
existsMatch documents where a field exists and is not null:
The bool Query
bool QueryThe bool query is the composition primitive β it combines other queries using four clauses:
must
Document must match. Contributes to score.
should
Document may match. Contributes to score.
must_not
Document must NOT match. Runs in filter context (no score).
filter
Document must match. Runs in filter context (no score, cached).
Example: find published articles tagged with elasticsearch, ranked by relevance to the text query, from 2025:
bool queries can nest arbitrarily, which enables complex filtering logic. Keep the must clause for relevance scoring and move all filtering to filter.
Sorting
By default results are sorted by _score descending. Override with sort:
Sorting on text fields is not allowed β sort on keyword or numeric/date fields only. This is another reason to define title.keyword as a multi-field.
Highlighting
Return highlighted snippets of matching text:
The response includes a highlight object per document with <em> wrapped around matching tokens. Replace <em>/</em> with your own markup (React, HTML, etc.) in the frontend.
Pagination
from / size (Offset Pagination)
from / size (Offset Pagination)Simple but has a hard limit: from + size cannot exceed index.max_result_window (default 10,000). Beyond that, use search_after.
search_after (Keyset Pagination)
search_after (Keyset Pagination)More efficient for deep pagination. Use the sort values of the last document from the previous page as the cursor:
Always include _id as a tiebreaker in the sort to ensure deterministic pagination. The frontend passes the sort values of the last visible document to fetch the next page.
Fuzzy Search
Allow for minor typos with fuzziness:
"fuzziness": "AUTO" uses edit distance 0 for strings of 1β2 characters, 1 for 3β5 characters, and 2 for 6+. It handles most single-character typos without generating too many false positives.
Source Filtering
Avoid returning large body or _source fields on every hit when you only need metadata for the list view:
Checking a Query's Score Explanation
When a document is scored unexpectedly high or low, use explain:
The response breaks down exactly how _score was calculated (BM25 term frequency, inverse document frequency, field length normalization).
Summary
Put relevance-scoring queries in
must; put filters infilterfor caching and performance.match/multi_matchfor full-text;term/terms/rangefor exact values in filter context.Use
boolas your composition primitive β nest it as deep as needed.Sort on
keywordor date/numeric fields; never ontext.For pagination beyond 10k results, use
search_afterwith a sort tiebreaker on_id.Source filter to limit response payload on list views.
Previous: Indexing Documents and Mappings | Next: Aggregations and Analytics
Last updated