"Connection Reset by Peer" Is the Loudest Error in Networking, and Here Is Where It Actually Comes From

“Connection reset by peer” is the loudest error in networking, the most common network error in production, the most common network error in a developer’s debugging session, and the most common network error a developer will see without knowing who actually sent the RST. The naive answer is “the server closed the connection.” The working answer is the five places the RST actually comes from, the one command that identifies the sender in 10 seconds, and the four application patterns (retries with backoff, idempotency, TCP keep-alives, half-close on shutdown) that turn a flaky connection into a reliable one.

The reason “connection reset by peer” is its own question and not just “TCP error” is that the error is the one that shows up after the TCP connection was established. The SYN succeeded, the SYN-ACK succeeded, the ACK succeeded, data flowed, and then the RST appeared. The error is not a “connection refused” (no SYN-ACK was sent), and the error is not a “connection timed out” (no RST, just silence). The error is a successful connection that was forcibly terminated mid-stream, and the error means the other side — or something pretending to be the other side — sent a TCP RST segment.

The short version
The five places the RST actually comes from
The one command that identifies the sender in 10 seconds
The four application patterns that turn a flaky connection into a reliable one
The seven mistakes that quietly turn a working connection into a RST
How this fits the rest of the deploy
FAQ

The short version

A TCP connection has two ways to close: a graceful FIN (each side says “I’m done sending”) and a forceful RST (one side says “I’m closing this now, drop any in-flight data”). The RST path is the “connection reset by peer” error. The RST can be sent by: (1) the remote application aborting the connection or closing the socket with unread data, (2) a proxy or load balancer with a shorter idle timeout than the client, (3) a firewall stateful-table expiry, (4) an application that set SO_LINGER to 0 (which forces a RST on close), or (5) the kernel generating a RST because the local socket no longer exists for the 4-tuple. Each cause has a different fix, and a developer who treats the error as a single bug is going to spend hours guessing.

The five places the RST actually comes from

1. The remote application closed the socket with unread data. A server that calls close() while there is still unread data from the client sends a RST, because the cleanest way for the kernel to signal “I’m not going to read this” is a RST. The pattern is the right answer for a server that crashed mid-request, the pattern is the right answer for a server that is shutting down a child process that has open connections, and the pattern is the lever that turns “the server crashed” into “the server RST’d the client.” The fix is to drain the socket before closing (shutdown(SHUT_WR), then read until EOF, then close()), and the fix is the lever that turns “I am sending a RST” into “I am sending a graceful FIN.”

2. A proxy or load balancer with a shorter idle timeout than the client. A reverse proxy (nginx, HAProxy, ALB, Cloudflare) or a load balancer that is configured to close idle connections after 60 seconds is going to RST a client that is still holding the connection open at second 61. The pattern is the right answer for a long-lived HTTP connection, the pattern is the right answer for a WebSocket, and the pattern is the lever that turns “the connection was idle for 60 seconds” into “the connection was forcibly closed by the proxy.” The fix is to either (a) shorten the client’s idle timeout to match the proxy, (b) send TCP keep-alives on the connection to keep it from being considered idle, or (c) increase the proxy’s idle timeout to match the client’s expectation.

3. A firewall stateful-table expiry. A stateful firewall (iptables with conntrack, AWS security groups, Cloudflare’s edge) tracks every TCP connection in a state table, and the firewall that runs out of state-table entries (or that ages out the entry for an idle connection) is going to drop subsequent packets as INVALID. The dropped packet may or may not trigger a RST — it depends on the firewall’s policy. The pattern is the right answer for a server with a long-lived connection through a stateful firewall, the pattern is the right answer for a connection that has been idle for hours, and the pattern is the lever that turns “the firewall ran out of state” into “the connection was dropped.” The fix is to either (a) enable TCP keep-alives (so the state table is refreshed), (b) increase the state-table size, or (c) shorten the connection’s idle period.

4. An application that set SO_LINGER to 0. A socket option that, when set to linger 0, causes close() to send a RST instead of a FIN. The pattern is the right answer for a web server that wants to reclaim resources quickly (Node.js’s default for http.Server), the pattern is the right answer for a Go server that calls http.Server.Close(), and the pattern is the lever that turns “I want a graceful close” into “I want an immediate close.” The fix is to either remove the SO_LINGER(0) setting, or to do a proper graceful shutdown (in Node.js: server.close() and wait for in-flight requests; in Go: http.Server.Shutdown(ctx)).

5. The kernel generated a RST because the local socket no longer exists. A packet that arrives for a 4-tuple (src IP, src port, dst IP, dst port) that has no matching local socket triggers a RST from the kernel. The pattern is the right answer for a server that restarted and lost the socket state, the pattern is the right answer for a server that is behind a connection pool that returned a stale connection, and the pattern is the lever that turns “the connection no longer exists” into “the kernel sent a RST.” The fix is to validate the connection before reuse (the connection pool should drop a connection that fails a health check) and to handle the RST gracefully in the application (reconnect and retry if the operation is idempotent).

The five are the floor. There is also the “load balancer health check sent a RST to drain a connection” case (the right answer is to use a graceful drain, not a RST), the “kernel OOM killer killed the process” case (the right answer is to fix the memory leak, not to retry the connection), and the “the network in between sent a RST” case (the right answer is to investigate the middlebox).

The one command that identifies the sender in 10 seconds

The most underrated debugging tool for “connection reset by peer” is tcpdump. The command captures every TCP segment on an interface, and the command can be filtered to show only RST segments. The pattern is the right answer for “I do not know who sent the RST,” the pattern is the right answer for “I want to see the conversation,” and the pattern is the lever that turns “I am guessing” into “I am looking at the actual packets.”

# Capture on eth0, show only RST segments, write to a pcap
sudo tcpdump -i eth0 -n 'tcp[tcpflags] & tcp-rst != 0' -w /tmp/rst.pcap

# Then in another terminal, look at the RSTs
sudo tcpdump -r /tmp/rst.pcap -n 'tcp[tcpflags] & tcp-rst != 0'

# Or summarize: who sent the RSTs?
sudo tcpdump -r /tmp/rst.pcap -n 'tcp[tcpflags] & tcp-rst != 0' | \
  awk '{print $3}' | sort | uniq -c | sort -rn

The output is a list of the source addresses that sent the RSTs, sorted by frequency. The output is the most underrated piece of information in a “connection reset” debugging session: the source address of the RST is the entity that decided to close the connection, and the entity is the one the developer needs to investigate.

If the RST comes from the remote server’s IP, the cause is one of (1), (4), or (5). If the RST comes from the load balancer’s IP, the cause is (2). If the RST comes from the firewall’s IP, the cause is (3). If the RST comes from a middlebox (CDN, WAF, corporate proxy), the cause is “the middlebox decided the connection was bad” and the developer needs to check the middlebox logs.

The one command is the floor. There is also ss -tan to see the local socket state, ss -tan 'sport = :443' to see the local sockets on a specific port, and strace -e trace=network to see the application’s socket calls.

The four application patterns that turn a flaky connection into a reliable one

A short, opinionated list of the four patterns that turn a “connection reset by peer” from a production-stopping bug into a handled error. The four are the ones a real application uses, and the four are the ones the developer can implement in a day.

1. Retries with exponential backoff (the most important one). A client that retries a failed request with exponential backoff (1s, 2s, 4s, 8s, with jitter) and a max retry count (3-5) is a client that survives a transient RST. The pattern is the right answer for any network call, the pattern is the right answer for any idempotent operation (GET, PUT, DELETE with idempotency key, anything with a request id), and the pattern is the lever that turns “I got a RST” into “I retried and the second attempt succeeded.” The honest answer is that a client without retries is a client that breaks on the first network hiccup, and the honest answer is that the retry logic is the most under-appreciated 20 lines of code in any codebase.

2. Idempotency keys (the second most important one). A server that accepts an Idempotency-Key header and returns the same response for the same key (within a TTL) is a server that makes retries safe. The pattern is the right answer for POST requests that are not naturally idempotent (a “create charge” request, a “send email” request, a “transfer money” request), the pattern is the right answer for any request the client is going to retry, and the pattern is the lever that turns “I retried and the customer was charged twice” into “I retried and the customer was charged once.” The pattern is the right answer for any modern API (Stripe popularized it, and it is now table stakes for any API that handles money or state).

3. TCP keep-alives (the third most important one). A connection that sends TCP keep-alives (a segment with no data, just the ACK flag, sent periodically) is a connection that the kernel marks as “still alive” in the state table, and the connection is the lever that turns “the firewall expired my state” into “the firewall kept my state fresh.” The pattern is the right answer for a long-lived connection that is going to be idle, the pattern is the right answer for a connection through a stateful firewall or a load balancer with an idle timeout, and the pattern is the lever that turns “I was disconnected after 60 seconds” into “I am still connected after 60 seconds.” The pattern is enabled with SO_KEEPALIVE (in most languages: socket.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)), with TCP_KEEPIDLE (the idle time before the first probe), TCP_KEEPINTVL (the interval between probes), and TCP_KEEPCNT (the number of probes before giving up).

4. Half-close on shutdown (the fourth most important one). A server that does a half-close on shutdown (shutdown(SHUT_WR) or socket.shutdownOutput(), then read until EOF, then close()) is a server that sends a graceful FIN instead of a RST. The pattern is the right answer for a server that is shutting down with in-flight requests, the pattern is the right answer for a server that wants to drain connections cleanly, and the pattern is the lever that turns “the client got a RST” into “the client got a clean close.” The pattern is the right answer for any server that has a graceful-shutdown path (Kubernetes pods that receive a SIGTERM, systemd services that receive a stop signal, blue-green deploys that drain a load balancer).

The four are the floor. There is also connection pooling (a pool that validates connections before reuse, drops stale ones, and reconnects on demand), circuit breakers (a client that stops calling a failing service and retries after a cooldown), and graceful drain on deploy (a load balancer that stops sending new requests to a server that is about to be replaced).

The seven mistakes that quietly turn a working connection into a RST

A short, opinionated list of mistakes that have actually turned working connections into “connection reset by peer” errors. None of them are dramatic. They are the boring ones.

Setting SO_LINGER to 0 in production. A web server that closes connections with a RST is a web server that gives every client a “connection reset” error instead of a clean close. The fix is to remove the SO_LINGER(0) setting (let the kernel do a normal close) or to implement a proper graceful shutdown, and the fix is the lever that turns “I am sending a RST to every client” into “I am sending a FIN to every client.”

Closing the socket from a different thread than the one that read from it. A multithreaded server that closes a socket from a different thread than the one that read from it is a server that may send a RST (because the read thread is still in the middle of reading). The fix is to coordinate the close (e.g. via a shutdown flag, a sync.WaitGroup, or a single owner for each socket), and the fix is the lever that turns “I closed the socket from the wrong thread” into “I closed the socket from the right thread.”

Reusing a connection that the server has already closed. A connection pool that returns a stale connection (one that the server has already closed) is a pool whose first read() or write() returns a RST. The fix is to validate the connection before reuse (a lightweight read() with a short timeout, or a PING for Redis-style protocols), and the fix is the lever that turns “the connection is dead” into “the connection pool dropped the dead one.”

Running a kernel or proxy that does not respect TCP keep-alives. A stateful firewall that does not see the keep-alives (because the keep-alives are blocked, or because the firewall is too aggressive in aging out the state) is a firewall that is going to RST the connection. The fix is to either (a) increase the firewall’s state timeout, (b) send application-level keep-alives (HTTP/2 PING, WebSocket ping), or (c) shorten the connection’s expected lifetime so the firewall never has to age it out.

Sending data on a connection that the application has already half-closed. A client that calls shutdown(SHUT_WR) (signaling “I’m done sending”) and then tries to write more data is a client that the server is going to RST. The fix is to either (a) not write after the half-close, (b) open a new connection for the additional data, or (c) use full-duplex protocols (HTTP/2 streams, gRPC) that handle multiplexing at the application layer.

Using a 30-second idle timeout in the client and a 60-second idle timeout in the server. A client that is going to be idle for 60 seconds and a server that is going to close the connection after 30 seconds is a client that is going to get a RST. The fix is to either (a) shorten the client’s idle timeout to match the server, (b) send keep-alives, or (c) shorten the server’s idle timeout to match the client, and the fix is the lever that turns “the client was idle too long” into “the client is sending keep-alives.”

Allowing a deployment to kill a pod with in-flight requests. A Kubernetes pod that is replaced without a graceful-shutdown period is a pod that is going to RST every client connected to it. The fix is to set terminationGracePeriodSeconds to a value that allows the in-flight requests to complete (or to send a SIGTERM that triggers the application’s graceful-shutdown path), and the fix is the lever that turns “the pod was killed mid-request” into “the pod was drained and then killed.”

How this fits the rest of the deploy

A network connection rarely lives in isolation. The connection is usually part of a stack (a client, a server, a load balancer, a firewall, a database) that runs the network calls the application makes. The platform that handles the connection should make the rest of the stack feel like part of the same conversation.

The services layer is the part of the platform that runs the long-lived API the client connects to. The database layer is the part that holds the data the application reads and writes. The error logs are the part that captures the RST when it happens. The connection is the wire; the platform is what runs on both ends of it.

An application on a platform where the service, the database, the storage, the logs, and the metrics are all in the same place is an application the team is going to be able to debug. An application on a platform where each piece is in a different console is an application the team is going to spend the first hour just opening the right tab.

For a team that wants to see the full cost of the project before it commits, the RunxBuild hosting calculator shows the line items together. The service, the database, the storage, the worker, the bandwidth — each one is a separate number, and the team’s mental model for the platform is the sum of those numbers.

FAQ

What does “connection reset by peer” mean?

It means the other side of the TCP connection sent a RST segment — a forceful close that discards any in-flight data. The RST can come from the remote application, a proxy, a load balancer, a firewall, or the local kernel. The error is the most common network error in production and the most common error a developer will see without knowing who sent the RST.

Is “connection reset by peer” the same as “ECONNRESET”?

Yes. ECONNRESET is the POSIX error code for the same condition; “connection reset by peer” is the human-readable form. Node.js throws an ECONNRESET error, Python raises a ConnectionResetError, Go returns net.ErrClosed or a similar error, and Java throws a SocketException with the message “Connection reset.” The underlying TCP condition is the same in every language.

How do I find out who sent the RST?

Run tcpdump on the local interface and filter for RST segments: sudo tcpdump -i eth0 -n 'tcp[tcpflags] & tcp-rst != 0'. The source address of the RST is the entity that decided to close the connection. If the source is the remote server, the cause is application-level. If the source is a load balancer or firewall, the cause is a timeout or policy. If the source is a middlebox, the cause is the middlebox’s rules.

How do I fix a “connection reset by peer” error in production?

First, identify the sender with tcpdump. Then match the sender to one of the five causes (application close, proxy timeout, firewall expiry, SO_LINGER(0), kernel RST). Then apply the right fix: drain the socket before close, increase the proxy’s idle timeout, enable TCP keep-alives, remove the SO_LINGER(0), or validate connections in the pool.

How do I prevent “connection reset by peer” in my application?

Use TCP keep-alives on long-lived connections, use retries with exponential backoff for transient failures, use idempotency keys for non-idempotent operations, and use a graceful shutdown (half-close, drain, then close) on the server side. The four patterns are the lever that turns “I am going to get a RST” into “I am going to handle the RST gracefully.”

What’s the difference between “connection reset” and “connection refused”?

“Connection refused” means no application is listening on the target port (the SYN got a RST immediately, or the kernel sent a RST because no socket is bound). “Connection reset” means the connection was established and then forcibly closed mid-stream (the RST came after data had flowed). “Connection timed out” means the SYN got no response at all.

Should I use TCP keep-alives or application-level keep-alives?

Both. TCP keep-alives (the OS-level mechanism) keep the kernel’s state table fresh and keep middleboxes from aging out the connection. Application-level keep-alives (HTTP/2 PING, WebSocket ping, Redis PING) keep the application layer informed about the connection’s health. The two are complementary: TCP keep-alives are the “is the connection still alive at the network level” check, application-level keep-alives are the “is the connection still alive at the application level” check.

How do I handle “connection reset by peer” in Node.js?

Wrap the network call in a retry function that catches ECONNRESET errors and retries with exponential backoff. Use an idempotency key for non-idempotent operations. Enable TCP keep-alives on the socket: socket.setKeepAlive(true, 30000). Use a connection pool that validates connections before reuse (e.g. pg Pool, http.Agent with keepAlive: true). The four patterns are the lever that turns “I am going to crash on a RST” into “I am going to survive a RST.”