lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL+tcoBV2AYZwjVFEEfPahKnFKJ-tZmkUCKF0k0hy6iFnCQWAQ@mail.gmail.com>
Date: Tue, 5 Nov 2024 17:56:55 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: Kuniyuki Iwashima <kuniyu@...zon.com>, davem@...emloft.net, dsahern@...nel.org, 
	horms@...nel.org, kernelxing@...cent.com, kuba@...nel.org, 
	netdev@...r.kernel.org, pabeni@...hat.com
Subject: Re: [PATCH net-next] tcp: avoid RST in 3-way shakehands due to
 failure in tcp_timewait_state_process

On Tue, Nov 5, 2024 at 5:50 PM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Tue, Nov 5, 2024 at 10:42 AM Jason Xing <kerneljasonxing@...il.com> wrote:
> >
> > On Tue, Nov 5, 2024 at 5:35 PM Eric Dumazet <edumazet@...gle.com> wrote:
> > >
> > > On Tue, Nov 5, 2024 at 10:08 AM Jason Xing <kerneljasonxing@...il.com> wrote:
> > > >
> > > > On Tue, Nov 5, 2024 at 3:49 PM Kuniyuki Iwashima <kuniyu@...zon.com> wrote:
> > > > >
> > > > > From: Jason Xing <kerneljasonxing@...il.com>
> > > > > Date: Tue,  5 Nov 2024 10:55:11 +0800
> > > > > > From: Jason Xing <kernelxing@...cent.com>
> > > > > >
> > > > > > We found there are rare chances that some RST packets appear during
> > > > > > the shakehands because the timewait socket cannot accept the SYN and
> > > > >
> > > > > s/shakehands/handshake/
> > > > >
> > > > > same in the subject.
> > > > >
> > > > > > doesn't return TCP_TW_SYN in tcp_timewait_state_process().
> > > > > >
> > > > > > Here is how things happen in production:
> > > > > > Time        Client(A)        Server(B)
> > > > > > 0s          SYN-->
> > > > > > ...
> > > > > > 132s                         <-- FIN
> > > > > > ...
> > > > > > 169s        FIN-->
> > > > > > 169s                         <-- ACK
> > > > > > 169s        SYN-->
> > > > > > 169s                         <-- ACK
> > > > > > 169s        RST-->
> > > > > > As above picture shows, the two flows have a start time difference
> > > > > > of 169 seconds. B starts to send FIN so it will finally enter into
> > > > > > TIMEWAIT state. Nearly at the same time A launches a new connection
> > > > > > that soon is reset by itself due to receiving a ACK.
> > > > > >
> > > > > > There are two key checks in tcp_timewait_state_process() when timewait
> > > > > > socket in B receives the SYN packet:
> > > > > > 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt)
> > > > > > 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0)
> > > > > >
> > > > > > Regarding the first rule, it fails as expected because in the first
> > > > > > connection the seq of SYN sent from A is 1892994276, then 169s have
> > > > > > passed, the second SYN has 239034613 (caused by overflow of s32).
> > > > > >
> > > > > > Then how about the second rule?
> > > > > > It fails again!
> > > > > > Let's take a look at how the tsval comes out:
> > > > > > __tcp_transmit_skb()
> > > > > >     -> tcp_syn_options()
> > > > > >         -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset;
> > > > > > The timestamp depends on two things, one is skb->skb_mstamp_ns, the
> > > > > > other is tp->tsoffset. The latter value is fixed, so we don't need
> > > > > > to care about it. If both operations (sending FIN and then starting
> > > > > > sending SYN) from A happen in 1ms, then the tsval would be the same.
> > > > > > It can be clearly seen in the tcpdump log. Notice that the tsval is
> > > > > > with millisecond precision.
> > > > > >
> > > > > > Based on the above analysis, I decided to make a small change to
> > > > > > the check in tcp_timewait_state_process() so that the second flow
> > > > > > would not fail.
> > > > > >
> > > > > > Signed-off-by: Jason Xing <kernelxing@...cent.com>
> > > > > > ---
> > > > > >  net/ipv4/tcp_minisocks.c | 2 +-
> > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
> > > > > > index bb1fe1ba867a..2b29d1bf5ca0 100644
> > > > > > --- a/net/ipv4/tcp_minisocks.c
> > > > > > +++ b/net/ipv4/tcp_minisocks.c
> > > > > > @@ -234,7 +234,7 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb,
> > > > > >       if (th->syn && !th->rst && !th->ack && !paws_reject &&
> > > > > >           (after(TCP_SKB_CB(skb)->seq, rcv_nxt) ||
> > > > > >            (tmp_opt.saw_tstamp &&
> > > > > > -           (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0))) {
> > > > >
> > > > > I think this follows RFC 6191 and such a change requires a formal
> > > > > discussion at IETF.
> > > > >
> > > > > https://datatracker.ietf.org/doc/html/rfc6191#section-2
> > > > >
> > > > > ---8<---
> > > > >       *  If TCP Timestamps would be enabled for the new incarnation of
> > > > >          the connection, and the timestamp contained in the incoming SYN
> > > > >          segment is greater than the last timestamp seen on the previous
> > > >
> > > > The true thing is that the timestamp of the SYN packet is greater than
> > > > that of the last packet, but the kernel implementation uses ms
> > > > precision (please see tcp_skb_timestamp_ts()). That function makes
> > > > those two timestamps the same.
> > > >
> > > > This case happens as expected, so the second connection should be
> > > > established. My confusion just popped out of my mind: what rules
> > > > should we follow to stop the second flow?
> > >
> > > Note that linux TCP stack can use usec resolution for TCP TS values.
> > >
> > > You might adopt it, and no longer care about this ms granularity.
> >
> > Right, I noticed this feature. I wonder if we can change the check in
> > tcp_timewait_state_process() like this patch if it has no side
> > effects? I'm worried that some programs don't use this feature. It's
> > the reason why I try to propose this patch to you.
>
> Breaking RFC ? I do not think so.

Oh right, I just can't figure it out why since we've already lost the
fine-grained timestamp in each skb. I spent a few days investigating
the bad cases this patch may bring, but I failed.

> Instead, use a usec clock and voila, the problem is solved.

Sure, it can work :)

Thanks,
Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ