The Elasticsearch API is the REST surface that lets a client index documents, run searches, and aggregate results against an Elasticsearch cluster. Every action against an Elasticsearch cluster is an HTTP call to one of these endpoints, and the client can be a low-level curl, a high-level client library (Python’s elasticsearch, Node’s @elastic/elasticsearch, Java’s official client), or a search framework sitting on top. The API is the part of Elasticsearch that your application touches, and the API is the part that decides what the team is going to build.
The reason “elasticsearch api” is its own question and not just “elasticsearch” is that the API has matured into a multi-surface product. The 8.x line split the search APIs from the document APIs from the index-management APIs from the cluster APIs, and the developer who is picking up Elasticsearch in 2026 is going to touch each surface in turn. The shape of the API is the part the developer should know before they write the first line of code.
Table of contents
- The short version
- The five surfaces the Elasticsearch API is organized into
- The five endpoints a typical app calls in production
- The four search flavors the API exposes
- The five client libraries a real team uses
- The seven costs a working developer should expect on the bill
- The mistakes that quietly burn an Elasticsearch bill
- FAQ
The short version
An Elasticsearch cluster is a set of nodes that store JSON documents in indices, and serve search and aggregation requests against those indices. The REST API is the HTTP surface that the developer uses to talk to the cluster. A typical call is a POST to /<index>/_search with a JSON body that describes the query, and a typical response is a JSON document with the hits, the aggregations, and the took time. The API is well-documented, fast, and the right answer for any application that needs full-text search, faceted search, or large-scale aggregations.
The five surfaces the Elasticsearch API is organized into
The Elasticsearch API is one base URL, but the endpoints are organized into five distinct surfaces. The surface changes what the endpoint is for, what the request body looks like, and what the response contains.
Document APIs. The surface for reading, writing, updating, and deleting individual documents. The endpoints are /<index>/_doc/<id> (index a document with an explicit ID), /<index>/_doc (index with an auto-generated ID), /<index>/_search (search), and the bulk endpoint /_bulk (batch index, update, delete). The surface is the one the developer touches first, and the one that carries the bulk of a typical application’s traffic.
Search APIs. The surface for running searches and aggregations. The endpoints are /<index>/_search (the main search endpoint), /<index>/_msearch (multi-search, for batching several queries in one call), /<index>/_search_shards (returns which shards a search will hit, useful for debugging), and the async search endpoint /_async_search (for long-running aggregations). The surface is the one the developer touches second, and the one that turns the application from “a database with a search box” into “a real search product.”
Index management APIs. The surface for creating, updating, and deleting indices, mappings, and aliases. The endpoints are /<index> (PUT to create, DELETE to remove, GET to inspect), /<index>/_mapping (PUT to update the mapping, GET to read it), and /<index>/_alias (for zero-downtime re-indexing). The surface is the one the developer touches third, and the one that decides how the data is structured on disk.
Cluster APIs. The surface for cluster health, node stats, shard allocation, and snapshot/restore. The endpoints are /_cluster/health (the health of the cluster), /_nodes/stats (per-node resource usage), /_cat/indices (a tabular view of all indices, useful in scripts), and /_snapshot (the snapshot/restore surface for backups). The surface is the one the developer touches fourth, and the one that decides how the cluster is observed and maintained.
Ingest APIs. The surface for ingest pipelines, reindexing, and update-by-query. The endpoints are /_ingest/pipeline (define a pipeline that runs on every document), /_reindex (copy documents from one index to another, optionally with a script), and /_update_by_query (run a script against every document matching a query). The surface is the one the developer touches fifth, and the one that turns the cluster from a passive store into an active transformation engine.
The five surfaces are not isolated. A typical request might create an index, define a pipeline, index a document, run a search, and check the cluster health — all in the same workflow. The developer should know which surface each call belongs to, and the right endpoint for each operation.
The five endpoints a typical app calls in production
A short, opinionated list of endpoints that show up in the typical production application. The endpoints are not the only ones, but they are the ones the developer should learn first.
PUT /<index>/_doc/<id>. Index a document with an explicit ID. The body is a JSON object that becomes the document’s source. The response is a JSON object with the result (created or updated), the document ID, the version, and the sequence number. The endpoint is the workhorse of any application that is writing structured data into Elasticsearch.
POST /<index>/_search. Run a search against an index. The body is a JSON object with the query, the size (how many hits to return), the from (offset for pagination), the sort (the sort order), and the aggs (aggregations). The response is a JSON object with the took time, the hits.total count, the hits.hits array (the actual results), and the aggregations object (the aggregated buckets). The endpoint is the workhorse of any application that is reading from Elasticsearch.
POST /_bulk. Index, update, or delete a batch of documents in a single call. The body is a newline-delimited JSON array, with one action line and one document line per operation. The response is a JSON object with the per-item result (status, _id, error if any). The endpoint is the workhorse of any application that is doing high-throughput writes, and the one that makes the difference between an application that scales and one that does not.
POST /<index>/_delete_by_query. Delete every document matching a query. The body is a JSON object with a query (the same query DSL as search). The response is a JSON object with the deleted count, the batches count, the version_conflicts count, and the took time. The endpoint is the workhorse of any application that is doing large-scale cleanup, and the one that turns a reindex into a one-call operation.
GET /_cluster/health. The health of the cluster. The query parameters are level (cluster, indices, shards), wait_for_status (green, yellow, red), and timeout (how long to wait for the desired status). The response is a JSON object with the status (green, yellow, red), the number_of_nodes, the active_shards, and the unassigned_shards. The endpoint is the workhorse of any monitoring setup, and the one that decides whether the cluster is healthy enough to serve traffic.
The four search flavors the API exposes
The search endpoint is the one the developer is going to use the most, and the search endpoint has four distinct flavors. The flavor changes what the developer can ask and what the response contains.
Match query. The basic full-text search. The body is {"query": {"match": {"field": "value"}}}. The response is a list of documents that match the query, ranked by relevance. The flavor is the one the developer reaches for first, and the one that covers 80% of real use cases.
Bool query. The compound query that combines multiple clauses. The body is {"query": {"bool": {"must": [...], "filter": [...], "should": [...], "must_not": [...]}}}. The flavor is the one the developer reaches for second, and the one that turns a search into a real product. The must and should clauses contribute to the relevance score, the filter and must_not clauses do not.
Aggregation query. The data-analysis query. The body is {"aggs": {"name": {"type": "terms|stats|date_histogram|..."}}}. The response is a list of buckets, with a count or a statistic for each. The flavor is the one the developer reaches for third, and the one that turns a search into a dashboard.
Vector search (kNN). The semantic search query. The body is {"knn": {"field": "embedding", "query_vector": [...], "k": 10, "num_candidates": 100}}. The response is a list of documents, ranked by vector similarity. The flavor is the one the developer reaches for fourth, and the one that powers the modern AI application. The flavor requires the index to be configured with a dense_vector mapping.
The four flavors are not exclusive. A typical query is a bool query with match clauses, filter clauses, and an aggregation on the result set. A modern AI application adds a kNN query as a should clause. The developer should know which flavor they want before they write the body.
The five client libraries a real team uses
The HTTP API is the truth, but the developer usually talks to Elasticsearch through a client library. The library is the one that handles connection pooling, retries, serialization, and the boilerplate that the raw HTTP API forces on the developer. The five libraries that show up in real teams are:
Python: elasticsearch-py. The official Python client. Mature, well-documented, supports the full API surface, and is the right answer for any Python application. The 8.x line aligns with the Elasticsearch 8.x server line, and the developer should keep the client and the server on the same major version.
Node.js: @elastic/elasticsearch. The official Node client. Mature, well-documented, supports the full API surface, and is the right answer for any Node application. The 8.x line aligns with the server line.
Java: co.elastic.clients:elasticsearch-java. The official Java client. Mature, well-documented, supports the full API surface, and is the right answer for any Java application. The client is generated from the server’s API spec, so it tracks the server’s API surface exactly.
Go: go-elasticsearch. The official Go client. Mature, well-documented, supports the full API surface, and is the right answer for any Go application. The 8.x line aligns with the server line.
Ruby: elasticsearch-ruby. The official Ruby client. Mature, well-documented, supports the full API surface, and is the right answer for any Ruby application. Rails integrations are well-supported through the elasticsearch-rails gem.
The five libraries are not the only ones — there are community clients for Rust, PHP, .NET, and others — but they are the ones the developer should learn first. The pattern is the same across all of them: connect to a cluster, index a document, run a search, get a response. The developer should pick the client that matches the application’s language, and should keep the client’s version aligned with the server’s major version.
The seven costs a working developer should expect on the bill
A short, opinionated list of costs that show up on a real Elasticsearch bill. None of them are surprising in isolation, but the developer who has not seen them before is going to be surprised by the total. The seven costs are:
The cluster cost. The hourly cost of the cluster, scaled by the number of nodes, the size of each node, and the storage attached to each node. A small cluster (3 nodes, 8 GB RAM, 100 GB storage) on a managed service is on the order of dollars per day. A large cluster (10+ nodes, 64+ GB RAM, multi-TB storage) is on the order of hundreds of dollars per day.
The storage cost. The cost of the data stored in the cluster, separate from the cluster cost on most managed services. Storage is usually priced per GB per month, and the cost grows with the index size. A team that is indexing more data than they need to is a team that is paying for storage they do not use.
The transfer cost. The cost of data transfer out of the cluster (egress) and between regions. Egress is the dominant cost for an application that is reading a lot of data from the cluster and sending it to clients. A team that is sending full search results to clients is a team that is paying for egress they could avoid by sending only the fields the client needs.
The snapshot cost. The cost of the snapshot storage, which is usually a separate line item on a managed service. The snapshot is the backup, and the cost grows with the index size and the snapshot retention. A team that is taking snapshots every hour and keeping them for 30 days is a team that is paying for 720 snapshots per index per month.
The query cost. Some managed services charge per query, especially for the high-level search APIs. A team that is doing 1 million searches per day is a team that is paying per-search, and the per-search cost is usually a small number that adds up.
The managed-service premium. The markup for using a managed service instead of running the cluster yourself. The premium is the cost of not having to operate the cluster, and the premium is usually worth it for teams that do not have a dedicated Elasticsearch operator.
The developer-time cost. The opportunity cost of the time the team spends learning the API, debugging the queries, and tuning the cluster. The cost is hard to measure, but the cost is real, and a team that has not built with Elasticsearch before is going to spend weeks on the learning curve.
The seven costs are not the only ones — there are also costs for security features, for plugin licenses, for cross-region replication, and for premium support — but they are the ones the developer should expect on the first bill. The fix is to model the cost before the cluster is built, and to set alerts for the line items that are going to dominate.
The mistakes that quietly burn an Elasticsearch bill
A short, opinionated list of mistakes that have actually burned real Elasticsearch bills. None of them are dramatic. They are the boring ones.
Indexing the same document in two indices. A team that is running two clusters (a “live” cluster and a “search” cluster) is paying for storage and compute twice. The fix is to use cross-cluster search, or to reindex from the live cluster to the search cluster on a schedule, not in real time.
Forgetting to set the refresh_interval. An index that has the default refresh_interval: 1s is being refreshed every second, which is expensive. The fix is to set the interval to a number that matches the application’s latency tolerance. A search index that does not need to be live within 1 second can use 30s or 60s.
Forgetting to delete old indices. A team that is creating a new index every day for time-series data is going to have hundreds of indices in the cluster after a year, and the cluster is going to be paying for the storage of all of them. The fix is a delete-by-query or an index aliasing strategy that retires old indices.
Querying the wrong field. A team that is running a match query on a text field is paying for the full-text analysis on every query. The fix is to use a keyword field for exact-match queries, or to add a keyword subfield to the text field and query that.
Storing the entire document in the _source. A team that is indexing 10 KB of source per document is paying for the storage of 10 KB per document. The fix is to disable _source for fields the application does not need, or to use stored_fields to store only the fields the search returns.
Using the default shard count. An index that is created with the default 1 shard is going to be a single-node bottleneck. An index that is created with the default shard count times 10 is going to be paying for 10x the cluster resources. The fix is to size the shard count to the index size, with each shard in the 10–50 GB range.
Forgetting to set up ILM. Index Lifecycle Management is the feature that moves indices from hot to warm to cold to delete on a schedule. A team that is not using ILM is paying for hot storage on indices that are only queried occasionally. The fix is to set up an ILM policy and apply it to every index.
How this fits the rest of the stack
An Elasticsearch cluster is rarely the whole project. The cluster usually sits behind an application, indexes data from a database, and serves results to a UI. The platform that handles the cluster should make the rest of the stack feel like part of the same conversation.
The services layer is the part of the platform that runs the application that talks to Elasticsearch. The database layer is the part that holds the data the application is indexing. The static layer is the part that hosts the UI the application is serving. The environment variables are the part that holds the credentials the application uses to authenticate against the cluster.
An Elasticsearch cluster that runs on a platform where the application, the database, the storage, the secrets, and the cluster are all in the same place is a cluster the team is going to be able to operate. An Elasticsearch cluster that runs on a platform where each piece is in a different console is a cluster the team is going to spend the first hour just opening the right tab.
For a team that wants to see the full cost of the project before it commits, the RunxBuild hosting calculator shows the line items together. The API, the database, the storage, the search cluster, the bandwidth — each one is a separate number, and the team’s mental model for the platform is the sum of those numbers.
FAQ
What is the Elasticsearch API?
The Elasticsearch API is the REST surface that lets a client index documents, run searches, and aggregate results against an Elasticsearch cluster. Every action against the cluster is an HTTP call to one of these endpoints. The client can be a low-level curl, a high-level client library, or a search framework sitting on top. The API is well-documented, fast, and the right answer for any application that needs full-text search, faceted search, or large-scale aggregations.
How do I connect to Elasticsearch from my application?
Use the official client library for your language: elasticsearch-py for Python, @elastic/elasticsearch for Node, co.elastic.clients:elasticsearch-java for Java, go-elasticsearch for Go, elasticsearch-ruby for Ruby. The client handles connection pooling, retries, serialization, and the boilerplate that the raw HTTP API forces on the developer. Keep the client’s version aligned with the server’s major version.
What is the difference between a match query and a term query?
A match query analyzes the input value (lowercase, tokenize, stem) and matches documents where the analyzed value appears in the field. A term query does not analyze the input — it matches the exact value as stored. A match query is the right answer for full-text search. A term query is the right answer for exact-match filters (status, category, user_id, etc.).
How do I make Elasticsearch queries faster?
Add a filter context, use keyword fields for exact-match queries, use the constant_score wrapper for filter-only queries, use the terms query for multi-value filters, set a reasonable size and from (deep pagination is slow), use the search_after parameter for deep pagination, and profile the query with the Profile API. The 90/10 rule: 90% of the speed gain comes from the right field type, 10% from the query shape.
How much does Elasticsearch cost to run?
A small managed cluster is on the order of dollars per day. A large managed cluster is on the order of hundreds of dollars per day. The cost is a function of the cluster size, the storage, the egress, the snapshots, and the per-query charges. The developer should model the cost before the cluster is built, and should set alerts for the line items that are going to dominate.
Can I use Elasticsearch for vector search?
Yes. The kNN (k-nearest neighbors) search in Elasticsearch 8.x supports vector similarity search over dense_vector fields. The pattern is to index the document with its embedding, run a kNN query with the query embedding, and return the closest documents by vector similarity. The pattern is the one that powers the modern AI application.
Is Elasticsearch the same as OpenSearch?
No, but the API surfaces are very similar. OpenSearch is the open-source fork of Elasticsearch 7, maintained by AWS and the community. The two share the same 7.x-era API surface, and the two have diverged since. A team that is starting fresh in 2026 should pick one and stick with it. The two are not interchangeable, and the developer should know which one they are running.