summaryrefslogtreecommitdiff
path: root/include/linux
diff options
context:
space:
mode:
authorDavid S. Miller <davem@davemloft.net>2015-02-08 01:03:26 -0800
committerDavid S. Miller <davem@davemloft.net>2015-02-08 01:03:26 -0800
commitf06535c599354816cfbc653ce8965804c7385c61 (patch)
treec01ae32365fd340ddec2a12682a43abd585aed97 /include/linux
parentca539345f8767cca221b5aa77bf4329c725d0d7e (diff)
parent4fb17a6091674f469e8ac85dc770fbf9a9ba7cc8 (diff)
Merge branch 'tcp_ack_loops'
Neal Cardwellsays: ==================== tcp: mitigate TCP ACK loops due to out-of-window validation dupacks This patch series mitigates "ack loop" DoS scenarios by rate-limiting outgoing duplicate ACKs sent in response to incoming "out of window" segments. Background ----------- There are several cases in which the TCP RFCs specify that a TCP endpoint should send a pure duplicate ACK in response to a pure duplicate ACK that appears to be invalid due to being "out of window": (1) RFC 793 (section 3.9, page 69) specifies that endpoints should send a duplicate ACK in response to an ACK when the incoming sequence number is invalid due to being outside the receive window: "If an incoming segment is not acceptable, an acknowledgment should be sent in reply". (2) RFC 793 (section 3.9, page 72) says: "If the ACK acknowledges something not yet sent (SEG.ACK > SND.NXT) then send an ACK". (3) RFC 1323 (section 4.2.1, page 18) specifies that endpoints should send a duplicate ACK in response to an ACK when the PAWS check for the incoming timestamp value fails: "If .... SEG.TSval < TS.Recent and if TS.Recent is valid ... Send an acknowledgement in reply" The problem ------------ Normally, this is not a problem. However, a buggy middlebox or malicious man-in-the-middle can inject a few packets into the conversation that advance each endpoint's notion of the current window (sequence, ACK, or timestamp), without either side noticing. In this case, from then on each side can think the other is sending invalid segments. Thus an infinite feedback loop of duplicate ACKs can ensue, as each endpoint receives a duplicate ACK, decides that it is invalid (due to sequence number, ACK number, or timestamp), and then sends a dupack in reply, which the other side decides is invalid, responding with a dupack... ad infinitum. This ping-pong feedback loop can happen at a very high rate. This phenomenon can and does happen in practice. It has been seen in datacenter and Internet contexts at Google, and has been documented by Anil Agarwal in the Nov 2013 tcpm thread "TCP mismatched sequence numbers issue", and Avery Fay in the Feb 2015 Linux netdev thread "Invalid timestamp? causing tight ack loop (hundreds of thousands of packets / sec)". This patch series ------------------ This patch series mitigates such ack loops by rate-limiting outgoing duplicate ACKs sent in response to incoming TCP packets that are for an existing connection but that are invalid due to any of the reasons mentioned above: sequence number (1), ACK field (2), or timestamp value (3). The rate limit for such duplicate ACKs is specified by a new sysctl, tcp_invalid_ratelimit, which specifies the minimal space between such outbound duplicate ACKs, in milliseconds. The default is 500 (500ms), and 0 disables the mechanism. We rate-limit these duplicate ACK responses rather than blocking them entirely or resetting the connection, because legitimate connections can rely on dupacks in response to some out-of-window segments. For example, zero window probes are typically sent with a sequence number that is below the current window, and ZWPs thus expect to thus elicit a dupack in response. Testing: this approach has been in use at Google for a while. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'include/linux')
-rw-r--r--include/linux/tcp.h6
1 files changed, 6 insertions, 0 deletions
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 67309ece0772..1a7adb411647 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -115,6 +115,7 @@ struct tcp_request_sock {
u32 rcv_isn;
u32 snt_isn;
u32 snt_synack; /* synack sent time */
+ u32 last_oow_ack_time; /* last SYNACK */
u32 rcv_nxt; /* the ack # by SYNACK. For
* FastOpen it's the seq#
* after data-in-SYN.
@@ -152,6 +153,7 @@ struct tcp_sock {
u32 snd_sml; /* Last byte of the most recently transmitted small packet */
u32 rcv_tstamp; /* timestamp of last received ACK (for keepalives) */
u32 lsndtime; /* timestamp of last sent data packet (for restart window) */
+ u32 last_oow_ack_time; /* timestamp of last out-of-window ACK */
u32 tsoffset; /* timestamp offset */
@@ -340,6 +342,10 @@ struct tcp_timewait_sock {
u32 tw_rcv_wnd;
u32 tw_ts_offset;
u32 tw_ts_recent;
+
+ /* The time we sent the last out-of-window ACK: */
+ u32 tw_last_oow_ack_time;
+
long tw_ts_recent_stamp;
#ifdef CONFIG_TCP_MD5SIG
struct tcp_md5sig_key *tw_md5_key;