TCP Protocol
Transmission Control Protocol provides reliable, ordered, byte-stream delivery over unreliable IP. It handles connection setup, flow control, congestion control, and retransmission — so applications don’t have to.
Why It Matters
HTTP, SSH, databases, and most internet traffic runs over TCP. Understanding TCP explains why connections take time to establish, why throughput ramps up slowly, why TIME_WAIT sockets accumulate, and how to tune performance for high-throughput or low-latency workloads.
Three-Way Handshake
Client Server
│ │
│── SYN (seq=1000) ──────────────→ │ "I want to connect"
│ │
│←── SYN-ACK (seq=3000, ack=1001)──│ "OK, here's my seq"
│ │
│── ACK (ack=3001) ──────────────→ │ "Got it, connected"
│ │
│←─────── Data exchange ──────────→│
Each side picks a random initial sequence number (ISN). The ack field means “I’ve received all bytes up to this number.”
Connection Teardown
Client Server
│── FIN ───────────────────────→ │ "I'm done sending"
│←── ACK ────────────────────── │
│←── FIN ────────────────────── │ "I'm done too"
│── ACK ───────────────────────→ │
│ │
│ TIME_WAIT (2×MSL ≈ 60s) │ ← prevents stale segments
TIME_WAIT ensures delayed packets from the old connection don’t contaminate a new one on the same port. This is why restarting a server sometimes fails with “Address already in use” — use SO_REUSEADDR.
TCP State Diagram (Key States)
CLOSED → SYN_SENT → ESTABLISHED → FIN_WAIT_1 → FIN_WAIT_2 → TIME_WAIT → CLOSED
→ CLOSE_WAIT → LAST_ACK → CLOSED (other side)
ss -tan | awk '{print $1}' | sort | uniq -c | sort -rn # count states
# Many TIME_WAIT → normal for high-throughput servers
# Many CLOSE_WAIT → app isn't closing sockets (bug)Flow Control (Sliding Window)
The receiver advertises its available buffer space (rwnd — receive window). The sender won’t send more than rwnd unacknowledged bytes.
Sender: [sent+ack'd][sent, unack'd][can send][can't send yet]
←── rwnd ────→
If the receiver is slow, rwnd shrinks to 0 (window closes). The sender waits and periodically probes with a 1-byte “window probe.”
Congestion Control
TCP adjusts sending rate to avoid overwhelming the network (not just the receiver):
| Phase | Behavior | When |
|---|---|---|
| Slow start | Double cwnd each RTT | Connection start, after timeout |
| Congestion avoidance | Increase cwnd by 1 MSS per RTT (AIMD) | After cwnd reaches ssthresh |
| Fast retransmit | Retransmit on 3 duplicate ACKs (don’t wait for timeout) | Suspected single packet loss |
| Fast recovery | Halve cwnd, don’t reset to 1 | After fast retransmit |
cwnd
↑
│ /\ /\
│ / \ / \ ← multiplicative decrease on loss
│ / \ / \
│ / \/ \
│ / exponential \
│/ (slow start) \
└──────────────────────→ time
Modern variants: CUBIC (Linux default, better for high-bandwidth), BBR (Google, models bandwidth and RTT).
Nagle’s Algorithm and Delayed ACKs
Nagle: buffer small writes until previous data is ACK’d (reduces tiny packets). Delayed ACK: receiver waits up to 40ms hoping to piggyback ACK on a data response.
Together they can cause 40ms latency spikes on interactive protocols. Fix: TCP_NODELAY socket option disables Nagle.
int flag = 1;
setsockopt(sock, IPPROTO_TCP, TCP_NODELAY, &flag, sizeof(flag));TCP Echo Server (Minimal)
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <arpa/inet.h>
int main(void) {
int srv = socket(AF_INET, SOCK_STREAM, 0);
int opt = 1;
setsockopt(srv, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));
struct sockaddr_in addr = {
.sin_family = AF_INET,
.sin_port = htons(8080),
.sin_addr.s_addr = INADDR_ANY,
};
bind(srv, (struct sockaddr *)&addr, sizeof(addr));
listen(srv, 128);
printf("Listening on :8080\n");
while (1) {
int cli = accept(srv, NULL, NULL);
char buf[4096];
ssize_t n;
while ((n = read(cli, buf, sizeof(buf))) > 0)
write(cli, buf, n); // echo back
close(cli);
}
}Related
- UDP Protocol — connectionless alternative
- Socket Programming — multiplexing many connections with epoll
- IP and Routing — TCP runs over IP
- TLS and Encryption — TLS handshake happens after TCP handshake
- OSI and TCP IP Model — TCP lives at the transport layer