Indexing Documents and Mappings

Series: Elasticsearch 101 | Article: 03

Overview

Mappings define the schema of documents in an Elasticsearch index. Get them right before you ingest data — changing field types requires a full reindex, which for large datasets is expensive and requires careful planning.

This article covers the essential field types, how to define explicit mappings, and the full CRUD lifecycle for documents.

Dynamic vs Explicit Mappings

By default, Elasticsearch infers types when you index a new field — this is called dynamic mapping. Index a document with a date string and Elasticsearch guesses it is a date. Index a number and it creates an integer or float field.

Dynamic mapping is convenient for getting started but problematic in production:

A field that starts as keyword cannot be queried as text later (and vice versa).
Dynamic mapping can create hundreds of fields from uncontrolled JSON (the "mapping explosion" problem).
It does not give you control over analyzers, normalizers, or field options.

Always use explicit mappings for production indexes. Disable dynamic mapping for indexes where the field set is known:

{
  "mappings": {
    "dynamic": "strict"
  }
}

With "dynamic": "strict", indexing a document that contains an unknown field will throw an error immediately — which is the correct behavior for catching schema drift early.

Essential Field Types

`text`

Use for full-text content that needs to be searched. The value is analyzed (tokenized, lowercased, stemmed) and added to the inverted index.

"body": { "type": "text", "analyzer": "english" }

A text field cannot be used for sorting, aggregations, or exact-match lookups.

`keyword`

Use for exact-match values: IDs, status codes, tags, category names, email addresses. The value is stored as-is, not analyzed.

"status": { "type": "keyword" },
"tags": { "type": "keyword" }

Use keyword for any field you plan to aggregate on, sort by, or filter with a term query.

`text` + `keyword` (multi-field)

A common pattern is to index the same field as both text (for full-text search) and keyword (for sorting/aggregations):

"title": {
  "type": "text",
  "fields": {
    "keyword": { "type": "keyword" }
  }
}

Query with title for full-text, title.keyword for exact match or sort.

Numeric Types

Type

Use for

integer

Whole numbers within ±2 billion

long

Whole numbers outside integer range

float

Approximate decimals (scores, ratings)

double

High-precision decimals

`date`

Elasticsearch stores dates as milliseconds since epoch internally. You can use ISO8601 strings in documents:

"published_at": { "type": "date", "format": "strict_date_optional_time" }

`boolean`

"is_published": { "type": "boolean" }

`object`

A JSON object embedded in the document. Object fields are flattened internally — there is no isolation between array elements. For array-of-objects queries to work correctly across fields, use nested instead.

"author": {
  "properties": {
    "id": { "type": "keyword" },
    "name": { "type": "text" }
  }
}

`nested`

Use when you have an array of objects and need to query across fields within the same object:

"comments": {
  "type": "nested",
  "properties": {
    "author_id": { "type": "keyword" },
    "body": { "type": "text" }
  }
}

nested fields have a performance cost — each nested object is stored as a separate hidden document. Use them deliberately and only when needed.

Creating an Index with an Explicit Mapping

All examples below use the Kibana Dev Tools console or curl. The index models a blog article.

PUT /articles
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  },
  "mappings": {
    "dynamic": "strict",
    "properties": {
      "id": { "type": "keyword" },
      "title": {
        "type": "text",
        "analyzer": "english",
        "fields": {
          "keyword": { "type": "keyword" }
        }
      },
      "slug": { "type": "keyword" },
      "body": { "type": "text", "analyzer": "english" },
      "author_id": { "type": "keyword" },
      "author_name": { "type": "text" },
      "tags": { "type": "keyword" },
      "is_published": { "type": "boolean" },
      "view_count": { "type": "integer" },
      "published_at": {
        "type": "date",
        "format": "strict_date_optional_time"
      },
      "updated_at": {
        "type": "date",
        "format": "strict_date_optional_time"
      }
    }
  }
}

number_of_replicas: 0 on a single-node cluster prevents yellow status for this index. Set it to 1 for production with multiple nodes.

Document CRUD

Index (Create) a Document

Use a specific ID to match the primary key from your source-of-truth database:

PUT /articles/_doc/article-001
{
  "id": "article-001",
  "title": "Getting started with Elasticsearch",
  "slug": "getting-started-elasticsearch",
  "body": "Elasticsearch is a distributed search engine built on Lucene...",
  "author_id": "user-123",
  "author_name": "John Doe",
  "tags": ["elasticsearch", "search", "backend"],
  "is_published": true,
  "view_count": 0,
  "published_at": "2025-03-01T09:00:00Z",
  "updated_at": "2025-03-01T09:00:00Z"
}

Using PUT /<index>/_doc/<id> is idempotent — it creates or replaces. To create only (fail if exists), use POST /<index>/_create/<id>.

Retrieve a Document

GET /articles/_doc/article-001

Returns the document plus metadata (_index, _id, _version, _source).

To retrieve only the _source body:

GET /articles/_source/article-001

Update a Document (Partial)

POST /articles/_update/article-001
{
  "doc": {
    "view_count": 1,
    "updated_at": "2025-03-02T10:00:00Z"
  }
}

The _update API merges the provided fields with the existing document. Only the changed fields need to be sent. Under the hood, this is still a delete + rewrite in Lucene.

Delete a Document

DELETE /articles/_doc/article-001

Bulk Operations

For inserting or updating multiple documents at once, use the Bulk API. It is significantly more efficient than individual requests:

POST /articles/_bulk
{"index":{"_id":"article-002"}}
{"id":"article-002","title":"Understanding Inverted Indexes","slug":"inverted-indexes","body":"An inverted index maps tokens to document IDs...","author_id":"user-123","author_name":"John Doe","tags":["elasticsearch","internals"],"is_published":true,"view_count":0,"published_at":"2025-03-05T10:00:00Z","updated_at":"2025-03-05T10:00:00Z"}
{"index":{"_id":"article-003"}}
{"id":"article-003","title":"Mapping tips for production","slug":"mapping-production","body":"Always use explicit mappings to prevent schema drift...","author_id":"user-456","author_name":"Jane Smith","tags":["elasticsearch","mapping","production"],"is_published":false,"view_count":0,"published_at":null,"updated_at":"2025-03-06T08:00:00Z"}

The bulk body alternates: one action line, one document line, one action line, one document line... Each line is a separate JSON object (newline-delimited, not an array).

Verify the Mapping

After index creation:

GET /articles/_mapping

Adding Fields to an Existing Mapping

You can add new fields to a strict mapping, but you cannot change the type of an existing field:

PUT /articles/_mapping
{
  "properties": {
    "reading_time_minutes": { "type": "integer" }
  }
}

To change an existing field's type — for example, changing author_name from text to a text + keyword multi-field — you need to reindex.

Reindexing

Reindexing copies documents from one index to another, applying the new mapping as documents are written:

POST _reindex
{
  "source": { "index": "articles" },
  "dest": { "index": "articles_v2" }
}

The common pattern I use in production:

Create articles_v2 with the updated mapping.
Run _reindex from articles to articles_v2.
Update the alias articles_alias to point to articles_v2.
Delete the old index.

Using aliases instead of hardcoded index names from the start makes this operation zero-downtime.

Index Aliases

An alias is a named pointer to one or more indexes. Applications query the alias, not the index directly:

POST _aliases
{
  "actions": [
    { "add": { "index": "articles", "alias": "articles_read" } }
  ]
}

At reindex time:

POST _aliases
{
  "actions": [
    { "remove": { "index": "articles", "alias": "articles_read" } },
    { "add": { "index": "articles_v2", "alias": "articles_read" } }
  ]
}

These two actions are atomic from the client's perspective.

Summary

Use "dynamic": "strict" to prevent unexpected field creation.
Choose between text (full-text search) and keyword (exact match, sorting, aggregation) intentionally.
Use multi-fields (title + title.keyword) when you need both behaviors on the same field.
nested type enables cross-field queries on array-of-object fields, but adds storage overhead.
Use controlled IDs matching your source database primary key.
Always use aliases rather than direct index names — it simplifies reindexing.

Previous: Setting Up Elasticsearch with Docker | Next: Search Queries Deep Dive

PreviousSetting Up Elasticsearch with Docker NextSearch Queries Deep Dive

Last updated 10 hours ago

hashtagOverview

hashtagDynamic vs Explicit Mappings

hashtagEssential Field Types

hashtagtext

hashtagkeyword

hashtagtext + keyword (multi-field)

hashtagNumeric Types

hashtagdate

hashtagboolean

hashtagobject

hashtagnested

hashtagCreating an Index with an Explicit Mapping

hashtagDocument CRUD

hashtagIndex (Create) a Document

hashtagRetrieve a Document

hashtagUpdate a Document (Partial)

hashtagDelete a Document

hashtagBulk Operations

hashtagVerify the Mapping

hashtagAdding Fields to an Existing Mapping

hashtagReindexing

hashtagIndex Aliases

hashtagSummary