Persistent storage is any storage layer that survives the lifecycle of the container, pod, or process that wrote to it. The data lives on a volume, a managed database, or an object store that the platform owns, and the application reads it back when the next container starts. Ephemeral storage is the opposite — the data lives on the container’s writable layer, and the moment the container is replaced, restarted, or rescheduled, the data is gone.
The cloud explainers stop at the difference. The part the explainers skip is the workflow triggers that tell you the project actually needs persistent storage, the cost of finding out too late, and the line between “managed database” and “managed object store” and “managed volume” that decides which one the project should pay for. The explainers are written for platform engineers. The post below is written for the developer who is shipping the project.
Table of contents
- The short version
- What persistent storage actually does
- The three flavors and what each one is for
- The seven workflow triggers that say you need it
- The cost of finding out at 3 a.m.
- The five things every persistent storage solution has to get right
- The mistakes that quietly cost you
- How this fits the rest of the stack
- FAQ
The short version
Persistent storage is the layer that lets the project’s data survive the project’s processes. The data lives somewhere the platform owns, the application reads and writes it through a connection string, and the platform handles the boring parts — backups, replication, point-in-time recovery, connection pooling. The application treats the storage layer as a service, not as a filesystem. The team’s mental model is “the database is always there, the cache is always there, the object store is always there,” and the platform makes the mental model match the operational truth.
The mental model is the point. The platform that owns the storage layer is the platform that owns the durability story. The team that owns the storage layer by hand is the team that finds out at 3 a.m. that the snapshot is three days old, the replica is in the wrong region, and the connection pool is exhausted.
What persistent storage actually does
Strip away the marketing and persistent storage is responsible for six jobs. The list is the same whether the storage is a managed database, a managed object store, a managed cache, or a managed volume. The implementation differs. The job does not.
Durability. The data is still there when the application comes back. The container that wrote the data has been replaced, the pod that wrote the data has been rescheduled, the VM that wrote the data has been terminated. The data is still there. The durability is not a property the team has to configure; the durability is a property the platform provides.
Backup. The data has a copy in another place, taken at a point in time, and the platform can restore from the backup. The backup cadence is a number the team sets (daily, hourly, every five minutes). The backup target is a number the platform owns (the same region, another region, an object store, a tape archive). The restore is a button the team clicks.
Replication. The data has more than one copy in more than one place, and the platform keeps the copies in sync. Replication is what turns a regional outage from a disaster into a non-event. The replication topology is a number the team sets (single-region, multi-region, multi-cloud). The platform handles the rest.
Connection pooling. The data is reachable from many application processes at once, without exhausting the database’s connection limit. The connection pool is the layer that sits between the application and the database, accepts connections from the application, and multiplexes them onto a small number of database connections. The application does not see the pool. The database sees a small, predictable number of clients.
Point-in-time recovery. The data can be rolled back to a specific point in time, not just to a specific backup. The platform keeps a log of every write, and the team can replay the log up to the second the data was correct. The retention is a number the team sets (seven days, thirty days, ninety days). The platform owns the log.
Observability. The platform knows what the data layer is doing — the slow query, the exhausted connection, the failed backup, the lagging replica. The team gets a metric, a log, an alert, and a way to fix the problem. The team does not get a “we will email you when the issue is resolved” message.
A storage layer that does all six jobs well is rare. A storage layer that does three of the six and waves at the rest is the most common kind. The filter below is for sorting one from the other.
The three flavors and what each one is for
The market has settled into three rough flavors of persistent storage. The flavors are not mutually exclusive — a serious project usually uses more than one — but the defaults and the pricing reveal which flavor is the right one for the workload.
The managed database flavor. The platform owns a Postgres, MySQL, MongoDB, or Redis instance. The application reads and writes the data through a connection string. The platform handles the durability, the backup, the replication, the connection pooling, the point-in-time recovery, and the observability. The team does not see the database server. The team does not manage the operating system. The team does not patch the database. The team uses the database.
This is the right flavor for the part of the application that is structured, queryable, and transactional. User accounts, orders, payments, inventory, content, sessions, audit logs — the list is long, and the answer is almost always “managed database.” The bill is dominated by the database size, the connection count, and the read-replica count. The bill is the same number every month, and the number is the number the team can plan around.
The managed object store flavor. The platform owns an S3-compatible bucket. The application reads and writes blobs — files, images, videos, backups, exports, imports — through HTTP. The platform handles the durability, the replication, the access control, and the observability. The team does not see the storage server. The team does not manage the access keys. The team does not pay for a filesystem the project is not using.
This is the right flavor for the part of the application that is unstructured, blob-shaped, and infrequently accessed. User uploads, generated PDFs, video previews, log archives, database backups, machine learning training data — the list is long, and the answer is almost always “managed object store.” The bill is dominated by the storage size and the egress. The bill is the same number every month, and the number is the number the team can plan around.
The managed volume flavor. The platform owns a block or file storage volume. The application reads and writes the volume as a filesystem. The platform handles the durability and the replication, but the application is responsible for the format, the schema, the backup, and the observability. The team sees a disk. The team does not see the storage server.
This is the right flavor for the part of the application that needs a real filesystem — a stateful service that has its own on-disk format, a single-node database that does not fit the managed database flavor, a data pipeline that streams files through a working directory. The bill is dominated by the volume size and the IOPS. The bill is the same number every month, and the number is the number the team can plan around — usually, but not always, a higher number than the managed database or the managed object store.
The three flavors are tools, not identities. Most projects use one primary flavor (the managed database), one secondary flavor (the managed object store), and a tertiary flavor that is rare (the managed volume). The project that uses one flavor for every problem is the project that is paying for a feature it does not need.
The seven workflow triggers that say you need it
A practical list, not an architecture lecture. These are the moments a working developer hits when the project crosses the line from “I can lose this” to “I cannot lose this.” Each trigger is a moment when the answer to “do I need persistent storage” stops being “later” and becomes “now.”
The first user signup. The first time the project has a user, the data the project cannot lose is the user’s account. The trigger is not the moment the data needs to scale. The trigger is the moment the data exists at all.
The first payment. The first time the project takes money, the data the project cannot lose is the payment record. The trigger is not the moment the payment is processed. The trigger is the moment the payment record is the only proof the customer paid.
The first user-generated file. The first time a user uploads a file, the file is data the project cannot lose. The trigger is not the moment the upload feature ships. The trigger is the moment the file is the only copy the user has.
The first cron job. The first time the project runs a scheduled task — a daily report, a weekly digest, a monthly invoice — the task output is data the project cannot lose. The trigger is not the moment the cron job ships. The trigger is the moment the cron job’s output is the only place the data lives.
The first restart of a stateful service. The first time a long-running process restarts and discovers its in-memory state is gone, the trigger is the moment the process learns that the disk was the right place for the data.
The first scaling event. The first time the project adds a second instance of a service, the data the two instances share is data that has to live in a layer both can reach. The trigger is not the moment the second instance ships. The trigger is the moment the two instances need to agree.
The first time the project asks “what is the backup story?” The moment someone on the team asks, the answer is “we do not have one yet.” The trigger is the question, not the disaster.
The seven triggers are not a checklist for a single project. They are a list of the moments a real project crosses the line. Most projects cross three of them in the first month. The project that crosses all seven without persistent storage is the project that is going to lose data.
The cost of finding out at 3 a.m.
The cost is not the storage layer. The cost is the incident. The incident is the moment the team discovers that the data layer is not durable, not backed up, not replicated, not observable, or not connected. The incident is the moment the team’s mental model of the project diverges from the operational truth.
The incident is the part the cloud explainers skip. The incident is the part that decides whether the project is the kind of project the team can ship to a paying customer, or the kind of project the team ships to a friend. The incident is the part that decides whether the team is the kind of team that has a 3 a.m. page rotation, or the kind of team that sleeps through the night.
The cost of the incident is not the storage bill. The cost of the incident is the lost data, the lost trust, the lost customer, the lost weekend, the lost feature that did not ship because the team was rebuilding the data layer. The cost of the incident is the reason the project budget for persistent storage is not a line item the team is trying to minimize. The cost of the incident is the reason the project budget for persistent storage is a line item the team is trying to get right.
The five things every persistent storage solution has to get right
These are the five things that decide whether the storage layer is the one the project is going to live on for the next three years. None of them is “has a nice dashboard,” because every storage layer has a nice dashboard. The list is the operational truth.
A backup that is actually a backup. The backup lives in another place. The backup is taken at a point in time. The backup can be restored without the application being online. The restore is tested. A backup that has not been tested is a backup that does not exist.
A replication topology that matches the project’s disaster tolerance. A project that can survive a regional outage needs multi-region replication. A project that can survive a single server failing needs single-region replication. A project that can survive neither needs to be honest about the disaster tolerance, and pick the storage layer that matches.
A connection pool the application does not have to manage. The storage layer accepts connections from the application and multiplexes them onto a small number of database connections. The application does not see the pool. The team does not write the pool. The platform owns the pool.
A point-in-time recovery window that matches the project’s data criticality. A project that can lose a day’s data needs a 24-hour point-in-time recovery. A project that can lose five minutes of data needs a five-minute point-in-time recovery. A project that can lose zero data needs a synchronous replica. The storage layer that does not offer the right window is the storage layer the team has to wrap in a custom solution.
An observability story the team can actually read. The slow query, the exhausted connection, the failed backup, the lagging replica, the storage layer that is about to bill more than the database server. A storage layer that hides the operational truth in a “we will email you” message is a storage layer the team cannot debug.
These five are the floor. Everything else is a feature.
The mistakes that quietly cost you
A short, opinionated list of mistakes that have actually cost real teams real money on real projects. None of them are dramatic. They are the boring ones.
Treating the in-memory state as a database. A Go service, a Python service, a Node service that holds user sessions in a map — the state is gone the moment the service restarts. The fix is a managed cache (Redis), a managed session store, or a managed database. The mistake is treating the memory as durable.
Treating the container’s writable layer as a disk. A service that writes a file to /tmp and expects the file to be there on the next deploy is a service that is going to learn the difference between ephemeral and persistent the hard way. The fix is a managed volume, a managed object store, or a managed database. The mistake is treating the container as a server.
Letting the backup window drift. A team that sets up a daily backup, then forgets to check that the backup is actually being taken, is a team that is going to discover the gap during the incident. The fix is an alert that fires when the backup fails, not when the backup is missing.
Picking the storage layer for the dashboard. A team that picks a storage layer because the dashboard is the prettiest is a team that is going to be unhappy with the backup story, the replication topology, the connection pool, the point-in-time recovery, and the observability. The dashboard is the part of the storage layer the team looks at. The bill is the part of the storage layer the team pays for.
Picking the storage layer for the per-gigabyte price. A storage layer that is cheap per gigabyte but does not have a backup, a replica, or an alert is a storage layer the team is going to pay for in incident response. The cheap per-gigabyte price is the line item the team is going to regret in month six.
How this fits the rest of the stack
Persistent storage is not the whole stack. The stack is the application, the database, the cache, the object store, the queue, the worker, the static site, the domain, the TLS certificate, and the way all of those pieces are observed. The persistent storage layer is the part of the stack that owns the data the application cannot lose.
The database layer is the part of the platform that handles the managed Postgres, the backups, the connection pooling, and the point-in-time recovery. The services layer is the part of the platform that handles the application, the build, the deploy, the health check, and the rollback. The static layer is the part of the platform that handles the static site, the CDN, and the custom domain. The environment variables are the part of the platform that holds the secrets the application reads at runtime.
The shape of the stack matters. A stack that splits the database from the object store from the cache from the worker into four different consoles is a stack that punishes the team for having a real project. A stack that lets the team see the whole picture — the application, the database, the storage, the domain, the logs — is a stack that respects the team’s time.
For a team that wants to see the full cost of the data layer before it commits, the RunxBuild hosting calculator shows the line items together. The service, the database, the storage, the build minutes, the bandwidth — each one is a separate number, and the team’s mental model for the platform is the sum of those numbers. The data layer is the line item the team is going to think about most, and the calculator is the tool that makes the line item honest.
FAQ
What is the difference between persistent storage and ephemeral storage?
Persistent storage survives the lifecycle of the container, pod, or process that wrote to it. Ephemeral storage is gone the moment the container is replaced, restarted, or rescheduled. The difference matters when the data is something the project cannot lose — user accounts, payment records, uploaded files, scheduled task output, session state, anything that has to outlive the process that wrote it.
What is the difference between persistent storage and a database?
A database is a kind of persistent storage. So is an object store, a block volume, a file volume, and a key-value cache. The word “database” usually means a structured query layer (Postgres, MySQL, MongoDB) that the application talks to through SQL or a similar query language. The word “persistent storage” is the broader category that includes the database, the object store, the volume, and the cache. The team picks the right kind of persistent storage for the workload.
What is the difference between persistent storage and a volume?
A volume is a kind of persistent storage. The word “volume” usually means a block or file storage layer that the application mounts as a filesystem. The word “persistent storage” is the broader category that includes the volume, the database, the object store, and the cache. The team picks the right kind of persistent storage for the workload.
How do I know if I need persistent storage?
The first time the project has a user, takes a payment, accepts an upload, runs a scheduled task, restarts a stateful service, scales a service past a single instance, or asks “what is the backup story?” — the answer is yes. The seven triggers are not a checklist for a single project. They are the moments a real project crosses the line from “I can lose this” to “I cannot lose this.”
What is the cheapest persistent storage?
The cheapest per-gigabyte persistent storage is rarely the cheapest persistent storage in the operational sense. A storage layer that is cheap per gigabyte but does not have a backup, a replica, or an alert is a storage layer the team is going to pay for in incident response. The cheap per-gigabyte price is the line item the team is going to regret in month six. The right storage layer is the one whose total cost — storage plus backup plus replica plus observability plus incident response — is the number the team can plan around.
What is the difference between persistent storage and a backup?
Persistent storage is the layer the application reads and writes to in production. A backup is a copy of the persistent storage taken at a point in time, stored in another place, and used to restore the data when the primary layer is lost. A project needs both. A project that has persistent storage without a backup has a single point of failure. A project that has a backup without persistent storage is restoring from the backup every restart. The team that has both is the team that survives the incident.