lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20230509192734.41099-1-kuniyu@amazon.com>
Date: Tue, 9 May 2023 12:27:34 -0700
From: Kuniyuki Iwashima <kuniyu@...zon.com>
To: <edumazet@...gle.com>
CC: <davem@...emloft.net>, <dsahern@...nel.org>, <kuba@...nel.org>,
	<kuni1840@...il.com>, <kuniyu@...zon.com>, <mubashirq@...gle.com>,
	<ncardwell@...gle.com>, <netdev@...r.kernel.org>, <pabeni@...hat.com>,
	<zob@...zon.com>, <nyoshif@...zon.com>
Subject: Re: [PATCH v1 net-next] tcp: Add net.ipv4.tcp_reset_challenge.

From: Eric Dumazet <edumazet@...gle.com>
Date: Tue, 9 May 2023 15:42:45 +0200
> On Tue, May 9, 2023 at 3:34 PM Paolo Abeni <pabeni@...hat.com> wrote:
> >
> > On Mon, 2023-05-08 at 15:27 -0700, Kuniyuki Iwashima wrote:
> > > Our Network Load Balancer (NLB) [0] consists of multiple nodes with unique
> > > IP addresses.  These nodes forward TCP flows from clients to backend
> > > targets by modifying the destination IP address.  NLB offers an option [1]
> > > to preserve the client's source IP address and port when routing packets
> > > to backend targets.
> > >
> > > When a client connects to two different NLB nodes, they may select the same
> > > backend target.  If the client uses the same source IP and port, the two
> > > flows at the backend side will have the same 4-tuple.
> > >
> > >                          +---------------+
> > >             1st flow     |  NLB Node #1  |   src: 10.0.0.215:60000
> > >          +------------>  |   10.0.3.4    |  +------------+
> > >          |               |    :10000     |               |
> > >          +               +---------------+               v
> > >   +------------+                                   +------------+
> > >   |   Client   |                                   |   Target   |
> > >   | 10.0.0.215 |                                   | 10.0.3.249 |
> > >   |   :60000   |                                   |   :10000   |
> > >   +------------+                                   +------------+
> > >          +               +---------------+               ^
> > >          |               |  NLB Node #2  |               |
> > >          +------------>  |   10.0.4.62   |  +------------+
> > >             2nd flow     |    :10000     |   src: 10.0.0.215:60000
> > >                          +---------------+
> > >
> > > The kernel responds to the SYN of the 2nd flow with Challenge ACK.  In this
> > > situation, there are multiple valid reply paths, but the flows behind NLB
> > > are tracked to ensure symmetric routing [2].  So, the Challenge ACK is
> > > routed back to the 2nd NLB node.
> > >
> > > The 2nd NLB node forwards the Challenge ACK to the client, but the client
> > > sees it as an invalid response to SYN in tcp_rcv_synsent_state_process()
> > > and finally sends RST in tcp_v[46]_do_rcv() based on the sequence number
> > > by tcp_v[46]_send_reset().  The RST effectively closes the first connection
> > > on the target, and a retransmitted SYN successfully establishes the 2nd
> > > connection.
> > >
> > >   On client:
> > >   10.0.0.215.60000 > 10.0.3.4.10000: Flags [S], seq 772948343  ... via NLB Node #1
> > >   10.0.3.4.10000 > 10.0.0.215.60000: Flags [S.], seq 3739044674, ack 772948344
> > >   10.0.0.215.60000 > 10.0.3.4.10000: Flags [.], ack 3739044675
> > >
> > >   10.0.0.215.60000 > 10.0.4.62.10000: Flags [S], seq 248180743 ... via NLB Node #2
> > >   10.0.4.62.10000 > 10.0.0.215.60000: Flags [.], ack 772948344 ... Invalid Challenge ACK
> > >   10.0.0.215.60000 > 10.0.4.62.10000: Flags [R], seq 772948344 ... RST w/ correct seq #
> > >   10.0.0.215.60000 > 10.0.4.62.10000: Flags [S], seq 248180743
> > >   10.0.4.62.10000 > 10.0.0.215.60000: Flags [S.], seq 4160908213, ack 248180744
> > >   10.0.0.215.60000 > 10.0.4.62.10000: Flags [.], ack 4160908214
> > >
> > >   On target:
> > >   10.0.0.215.60000 > 10.0.3.249.10000: Flags [S], seq 772948343 ... via NLB Node #1
> > >   10.0.3.249.10000 > 10.0.0.215.60000: Flags [S.], seq 3739044674, ack 772948344
> > >   10.0.0.215.60000 > 10.0.3.249.10000: Flags [.], ack 3739044675
> > >
> > >   10.0.0.215.60000 > 10.0.3.249.10000: Flags [S], seq 248180743 ... via NLB Node #2
> > >   10.0.3.249.10000 > 10.0.0.215.60000: Flags [.], ack 772948344 ... Forwarded to 2nd flow
> > >   10.0.0.215.60000 > 10.0.3.249.10000: Flags [R], seq 772948344 ... Close the 1st connection
> > >   10.0.0.215.60000 > 10.0.3.249.10000: Flags [S], seq 248180743
> > >   10.0.3.249.10000 > 10.0.0.215.60000: Flags [S.], seq 4160908213, ack 248180744
> > >   10.0.0.215.60000 > 10.0.3.249.10000: Flags [.], ack 4160908214
> > >
> > > The first connection is still alive from the client's point of view.  When
> > > the client sends data over the first connection, the target responds with
> > > Challenge ACK.  The Challenge ACK is routed back to the 1st connection, and
> > > the client responds with Dup ACK, and the target responds to the Dup ACK
> > > with Challenge ACK, and this continues.
> > >
> > >   On client:
> > >   10.0.0.215.60000 > 10.0.3.4.10000: Flags [P.], seq 772948344:772948349, ack 3739044675, length 5
> > >   10.0.3.4.10000 > 10.0.0.215.60000: Flags [.], ack 248180744, length 0  ... Challenge ACK
> > >   10.0.0.215.60000 > 10.0.3.4.10000: Flags [.], ack 3739044675, length 0 ... Dup ACK
> > >   10.0.3.4.10000 > 10.0.0.215.60000: Flags [.], ack 248180744, length 0  ... Challenge ACK
> > >   ...
> > >
> > > In RFC 5961, Challenge ACK assumes that it will be routed back via an
> > > asymmetric path to the peer of the established connection.  However, in
> > > a situation where multiple valid reply paths are tracked, Challenge ACK
> > > gives a hint to snipe another connection and also triggers the Challenge
> > > ACK Dup ACK war on the connection.
> > >
> > > A new sysctl knob, net.ipv4.tcp_reset_challenge, allows us to respond to
> > > invalid packets described in RFC 5961 with RST and keep the established
> > > socket open.
> >
> > I did not double check with the RFC, but the above looks like a knob to
> > enable a protocol violation.
> >
> > I'm wondering if the same results could be obtained with a BPF program
> > instead?

XDP could.  But implementing TCP stack like prog could be hard, and more
than anything, this is actually a corner case (we quite often observe
this though), so applying such prog for every packet will cause a large
performance drop.


> >
> > IMHO we should avoid adding system wide knobs for such specific use-
> > case, especially when the controlled behaviour is against the spec.
> >
> 
> Agreed, this patch looks quite suspect to me.
> 
> We will then add many more knobs for other similar situations.

I think there is a tacit understanding in RFC that can be applied to
most situations.  Challenge ACK assumes a asymmetric routing between
peer vs attacker, but the assumption is not always true, and we have
a situation where RFC 5961 is rather harmful.

There is some knobs to switch on/off RFC behaviour.  We just want to
disable RFC 5961 here.


> Network Load Balancers can be tricky to implement right.
> 
> We should not have to tweak many TCP stacks just because of one implementation.
> 
> Maglev (one of the load balancers used at Google) never asked for a
> modification of a TCP stack.

I'm reading this Maglev paper ( https://research.google/pubs/pub44824/ )
and the notable difference is that our Network Load Balancer does not
support DSR and ensure symmetric routing so that the response can be
filtered based on customer's configuration (of Security Group).

This issue can be caused generally, but I'll leave the explanation to
Jon.


> 
> I think such a change would need IETF discussion and approval first.

CCing Yoshi from TCPM WG, I'm discussing this topic with him.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ