TCP Protocol

Transmission Control Protocol provides reliable, ordered, byte-stream delivery over unreliable IP. It handles connection setup, flow control, congestion control, and retransmission — so applications don’t have to.

Why It Matters

HTTP, SSH, databases, and most internet traffic runs over TCP. Understanding TCP explains why connections take time to establish, why throughput ramps up slowly, why TIME_WAIT sockets accumulate, and how to tune performance for high-throughput or low-latency workloads.

Three-Way Handshake

Client                              Server
  │                                   │
  │── SYN (seq=1000) ──────────────→ │  "I want to connect"
  │                                   │
  │←── SYN-ACK (seq=3000, ack=1001)──│  "OK, here's my seq"
  │                                   │
  │── ACK (ack=3001) ──────────────→ │  "Got it, connected"
  │                                   │
  │←─────── Data exchange ──────────→│

Each side picks a random initial sequence number (ISN). The ack field means “I’ve received all bytes up to this number.”

Connection Teardown

Client                              Server
  │── FIN ───────────────────────→ │  "I'm done sending"
  │←── ACK ──────────────────────  │
  │←── FIN ──────────────────────  │  "I'm done too"
  │── ACK ───────────────────────→ │
  │                                 │
  │ TIME_WAIT (2×MSL ≈ 60s)       │  ← prevents stale segments

TIME_WAIT ensures delayed packets from the old connection don’t contaminate a new one on the same port. This is why restarting a server sometimes fails with “Address already in use” — use SO_REUSEADDR.

TCP State Diagram (Key States)

CLOSED → SYN_SENT → ESTABLISHED → FIN_WAIT_1 → FIN_WAIT_2 → TIME_WAIT → CLOSED
                                → CLOSE_WAIT → LAST_ACK → CLOSED (other side)
ss -tan | awk '{print $1}' | sort | uniq -c | sort -rn  # count states
# Many TIME_WAIT → normal for high-throughput servers
# Many CLOSE_WAIT → app isn't closing sockets (bug)

Flow Control (Sliding Window)

The receiver advertises its available buffer space (rwnd — receive window). The sender won’t send more than rwnd unacknowledged bytes.

Sender:    [sent+ack'd][sent, unack'd][can send][can't send yet]
                        ←── rwnd ────→

If the receiver is slow, rwnd shrinks to 0 (window closes). The sender waits and periodically probes with a 1-byte “window probe.”

Congestion Control

TCP adjusts sending rate to avoid overwhelming the network (not just the receiver):

PhaseBehaviorWhen
Slow startDouble cwnd each RTTConnection start, after timeout
Congestion avoidanceIncrease cwnd by 1 MSS per RTT (AIMD)After cwnd reaches ssthresh
Fast retransmitRetransmit on 3 duplicate ACKs (don’t wait for timeout)Suspected single packet loss
Fast recoveryHalve cwnd, don’t reset to 1After fast retransmit
cwnd
  ↑
  │     /\      /\
  │    /  \    /  \     ← multiplicative decrease on loss
  │   /    \  /    \
  │  /      \/      \
  │ / exponential     \
  │/  (slow start)     \
  └──────────────────────→ time

Modern variants: CUBIC (Linux default, better for high-bandwidth), BBR (Google, models bandwidth and RTT).

Nagle’s Algorithm and Delayed ACKs

Nagle: buffer small writes until previous data is ACK’d (reduces tiny packets). Delayed ACK: receiver waits up to 40ms hoping to piggyback ACK on a data response.

Together they can cause 40ms latency spikes on interactive protocols. Fix: TCP_NODELAY socket option disables Nagle.

int flag = 1;
setsockopt(sock, IPPROTO_TCP, TCP_NODELAY, &flag, sizeof(flag));

TCP Echo Server (Minimal)

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <arpa/inet.h>
 
int main(void) {
    int srv = socket(AF_INET, SOCK_STREAM, 0);
    int opt = 1;
    setsockopt(srv, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));
 
    struct sockaddr_in addr = {
        .sin_family = AF_INET,
        .sin_port = htons(8080),
        .sin_addr.s_addr = INADDR_ANY,
    };
    bind(srv, (struct sockaddr *)&addr, sizeof(addr));
    listen(srv, 128);
    printf("Listening on :8080\n");
 
    while (1) {
        int cli = accept(srv, NULL, NULL);
        char buf[4096];
        ssize_t n;
        while ((n = read(cli, buf, sizeof(buf))) > 0)
            write(cli, buf, n);  // echo back
        close(cli);
    }
}