netdev - Re: [PATCH net-next] tcp: avoid RST in 3-way shakehands due to failure in tcp_timewait_state

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAL+tcoCBvbR_=c_SxKmAyAE4U5wTgr=hp08yNxm0QUSWdgge5Q@mail.gmail.com>
Date: Tue, 5 Nov 2024 17:41:38 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: Kuniyuki Iwashima <kuniyu@...zon.com>, davem@...emloft.net, dsahern@...nel.org, 
	horms@...nel.org, kernelxing@...cent.com, kuba@...nel.org, 
	netdev@...r.kernel.org, pabeni@...hat.com
Subject: Re: [PATCH net-next] tcp: avoid RST in 3-way shakehands due to
 failure in tcp_timewait_state_process

On Tue, Nov 5, 2024 at 5:35 PM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Tue, Nov 5, 2024 at 10:08 AM Jason Xing <kerneljasonxing@...il.com> wrote:
> >
> > On Tue, Nov 5, 2024 at 3:49 PM Kuniyuki Iwashima <kuniyu@...zon.com> wrote:
> > >
> > > From: Jason Xing <kerneljasonxing@...il.com>
> > > Date: Tue,  5 Nov 2024 10:55:11 +0800
> > > > From: Jason Xing <kernelxing@...cent.com>
> > > >
> > > > We found there are rare chances that some RST packets appear during
> > > > the shakehands because the timewait socket cannot accept the SYN and
> > >
> > > s/shakehands/handshake/
> > >
> > > same in the subject.
> > >
> > > > doesn't return TCP_TW_SYN in tcp_timewait_state_process().
> > > >
> > > > Here is how things happen in production:
> > > > Time        Client(A)        Server(B)
> > > > 0s          SYN-->
> > > > ...
> > > > 132s                         <-- FIN
> > > > ...
> > > > 169s        FIN-->
> > > > 169s                         <-- ACK
> > > > 169s        SYN-->
> > > > 169s                         <-- ACK
> > > > 169s        RST-->
> > > > As above picture shows, the two flows have a start time difference
> > > > of 169 seconds. B starts to send FIN so it will finally enter into
> > > > TIMEWAIT state. Nearly at the same time A launches a new connection
> > > > that soon is reset by itself due to receiving a ACK.
> > > >
> > > > There are two key checks in tcp_timewait_state_process() when timewait
> > > > socket in B receives the SYN packet:
> > > > 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt)
> > > > 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0)
> > > >
> > > > Regarding the first rule, it fails as expected because in the first
> > > > connection the seq of SYN sent from A is 1892994276, then 169s have
> > > > passed, the second SYN has 239034613 (caused by overflow of s32).
> > > >
> > > > Then how about the second rule?
> > > > It fails again!
> > > > Let's take a look at how the tsval comes out:
> > > > __tcp_transmit_skb()
> > > >     -> tcp_syn_options()
> > > >         -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset;
> > > > The timestamp depends on two things, one is skb->skb_mstamp_ns, the
> > > > other is tp->tsoffset. The latter value is fixed, so we don't need
> > > > to care about it. If both operations (sending FIN and then starting
> > > > sending SYN) from A happen in 1ms, then the tsval would be the same.
> > > > It can be clearly seen in the tcpdump log. Notice that the tsval is
> > > > with millisecond precision.
> > > >
> > > > Based on the above analysis, I decided to make a small change to
> > > > the check in tcp_timewait_state_process() so that the second flow
> > > > would not fail.
> > > >
> > > > Signed-off-by: Jason Xing <kernelxing@...cent.com>
> > > > ---
> > > >  net/ipv4/tcp_minisocks.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
> > > > index bb1fe1ba867a..2b29d1bf5ca0 100644
> > > > --- a/net/ipv4/tcp_minisocks.c
> > > > +++ b/net/ipv4/tcp_minisocks.c
> > > > @@ -234,7 +234,7 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb,
> > > >       if (th->syn && !th->rst && !th->ack && !paws_reject &&
> > > >           (after(TCP_SKB_CB(skb)->seq, rcv_nxt) ||
> > > >            (tmp_opt.saw_tstamp &&
> > > > -           (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0))) {
> > >
> > > I think this follows RFC 6191 and such a change requires a formal
> > > discussion at IETF.
> > >
> > > https://datatracker.ietf.org/doc/html/rfc6191#section-2
> > >
> > > ---8<---
> > >       *  If TCP Timestamps would be enabled for the new incarnation of
> > >          the connection, and the timestamp contained in the incoming SYN
> > >          segment is greater than the last timestamp seen on the previous
> >
> > The true thing is that the timestamp of the SYN packet is greater than
> > that of the last packet, but the kernel implementation uses ms
> > precision (please see tcp_skb_timestamp_ts()). That function makes
> > those two timestamps the same.
> >
> > This case happens as expected, so the second connection should be
> > established. My confusion just popped out of my mind: what rules
> > should we follow to stop the second flow?
>
> Note that linux TCP stack can use usec resolution for TCP TS values.
>
> You might adopt it, and no longer care about this ms granularity.

Right, I noticed this feature. I wonder if we can change the check in
tcp_timewait_state_process() like this patch if it has no side
effects? I'm worried that some programs don't use this feature. It's
the reason why I try to propose this patch to you.

Thanks,
Jason