[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iL7uBQQmQcZnbmX-23eyQn5GOecyp2ddcb-oqpJxK6G0w@mail.gmail.com>
Date: Thu, 7 Nov 2024 10:45:08 +0100
From: Eric Dumazet <edumazet@...gle.com>
To: Jason Xing <kerneljasonxing@...il.com>
Cc: Philo Lu <lulie@...ux.alibaba.com>, davem@...emloft.net, kuba@...nel.org, 
	pabeni@...hat.com, dsahern@...nel.org, horms@...nel.org, 
	netdev@...r.kernel.org, Jason Xing <kernelxing@...cent.com>
Subject: Re: [PATCH net-next] tcp: avoid RST in 3-way shakehands due to
 failure in tcp_timewait_state_process
On Thu, Nov 7, 2024 at 10:30 AM Jason Xing <kerneljasonxing@...il.com> wrote:
>
> On Thu, Nov 7, 2024 at 5:00 PM Jason Xing <kerneljasonxing@...il.com> wrote:
> >
> > On Thu, Nov 7, 2024 at 4:37 PM Eric Dumazet <edumazet@...gle.com> wrote:
> > >
> > > On Thu, Nov 7, 2024 at 9:27 AM Jason Xing <kerneljasonxing@...il.com> wrote:
> > > >
> > > > On Thu, Nov 7, 2024 at 4:22 PM Philo Lu <lulie@...ux.alibaba.com> wrote:
> > > > >
> > > > >
> > > > >
> > > > > On 2024/11/7 16:01, Jason Xing wrote:
> > > > > > On Thu, Nov 7, 2024 at 3:51 PM Philo Lu <lulie@...ux.alibaba.com> wrote:
> > > > > >>
> > > > > >> Hi Jason,
> > > > > >>
> > > > > >> On 2024/11/5 10:55, Jason Xing wrote:
> > > > > >>> From: Jason Xing <kernelxing@...cent.com>
> > > > > >>>
> > > > > >>> We found there are rare chances that some RST packets appear during
> > > > > >>> the shakehands because the timewait socket cannot accept the SYN and
> > > > > >>> doesn't return TCP_TW_SYN in tcp_timewait_state_process().
> > > > > >>>
> > > > > >>> Here is how things happen in production:
> > > > > >>> Time        Client(A)        Server(B)
> > > > > >>> 0s          SYN-->
> > > > > >>> ...
> > > > > >>> 132s                         <-- FIN
> > > > > >>> ...
> > > > > >>> 169s        FIN-->
> > > > > >>> 169s                         <-- ACK
> > > > > >>> 169s        SYN-->
> > > > > >>> 169s                         <-- ACK
> > > > > >>> 169s        RST-->
> > > > > >>> As above picture shows, the two flows have a start time difference
> > > > > >>> of 169 seconds. B starts to send FIN so it will finally enter into
> > > > > >>> TIMEWAIT state. Nearly at the same time A launches a new connection
> > > > > >>> that soon is reset by itself due to receiving a ACK.
> > > > > >>>
> > > > > >>> There are two key checks in tcp_timewait_state_process() when timewait
> > > > > >>> socket in B receives the SYN packet:
> > > > > >>> 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt)
> > > > > >>> 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0)
> > > > > >>>
> > > > > >>> Regarding the first rule, it fails as expected because in the first
> > > > > >>> connection the seq of SYN sent from A is 1892994276, then 169s have
> > > > > >>> passed, the second SYN has 239034613 (caused by overflow of s32).
> > > > > >>>
> > > > > >>> Then how about the second rule?
> > > > > >>> It fails again!
> > > > > >>> Let's take a look at how the tsval comes out:
> > > > > >>> __tcp_transmit_skb()
> > > > > >>>       -> tcp_syn_options()
> > > > > >>>           -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset;
> > > > > >>> The timestamp depends on two things, one is skb->skb_mstamp_ns, the
> > > > > >>> other is tp->tsoffset. The latter value is fixed, so we don't need
> > > > > >>> to care about it. If both operations (sending FIN and then starting
> > > > > >>> sending SYN) from A happen in 1ms, then the tsval would be the same.
> > > > > >>> It can be clearly seen in the tcpdump log. Notice that the tsval is
> > > > > >>> with millisecond precision.
> > > > > >>>
> > > > > >>> Based on the above analysis, I decided to make a small change to
> > > > > >>> the check in tcp_timewait_state_process() so that the second flow
> > > > > >>> would not fail.
> > > > > >>>
> > > > > >>
> > > > > >> I wonder what a bad result the RST causes. As far as I know, the client
> > > > > >> will not close the connect and return. Instead, it re-sends an SYN in
> > > > > >> TCP_TIMEOUT_MIN(2) jiffies (implemented in
> > > > > >> tcp_rcv_synsent_state_process). So the second connection could still be
> > > > > >> established successfully, at the cost of a bit more delay. Like:
> > > > > >>
> > > > > >>    Time        Client(A)        Server(B)
> > > > > >>    0s          SYN-->
> > > > > >>    ...
> > > > > >>    132s                         <-- FIN
> > > > > >>    ...
> > > > > >>    169s        FIN-->
> > > > > >>    169s                         <-- ACK
> > > > > >>    169s        SYN-->
> > > > > >>    169s                         <-- ACK
> > > > > >>    169s        RST-->
> > > > > >> ~2jiffies    SYN-->
> >
> > It doesn't happen. I dare to say the SYN will not be retransmitted
> > soon which I can tell from many tcpdump logs I captured.
>
> My mind went blank probably because of getting sick. Sorry. Why the
> tcpdump doesn't show the retransmitted SYN packet is our private
> kernel missed this following commit:
>
> commit 9603d47bad4642118fa19fd1562569663d9235f6
> Author: SeongJae Park <sjpark@...zon.de>
> Date:   Sun Feb 2 03:38:26 2020 +0000
>
>     tcp: Reduce SYN resend delay if a suspicous ACK is received
>
>     When closing a connection, the two acks that required to change closing
>     socket's status to FIN_WAIT_2 and then TIME_WAIT could be processed in
>     reverse order.  This is possible in RSS disabled environments such as a
>     connection inside a host.
Not having a patch from linux-5.6 does not really give a good impression,
please next time make sure your patches are needed for upstream trees,
net or net-next.
This also means you can write a packetdrill test to show the benefit
of this patch,
and send it as a new selftests ;)
Back to packetdrill, and everyone will be happy.
Powered by blists - more mailing lists
 
