lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAL+tcoCzbLKV7xVRvQsYFAoZp1OrXQ1+-xn0rCgHfcTttT=-Uw@mail.gmail.com>
Date: Thu, 7 Nov 2024 17:57:10 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: Philo Lu <lulie@...ux.alibaba.com>, davem@...emloft.net, kuba@...nel.org, 
	pabeni@...hat.com, dsahern@...nel.org, horms@...nel.org, 
	netdev@...r.kernel.org, Jason Xing <kernelxing@...cent.com>
Subject: Re: [PATCH net-next] tcp: avoid RST in 3-way shakehands due to
 failure in tcp_timewait_state_process

On Thu, Nov 7, 2024 at 5:45 PM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Thu, Nov 7, 2024 at 10:30 AM Jason Xing <kerneljasonxing@...il.com> wrote:
> >
> > On Thu, Nov 7, 2024 at 5:00 PM Jason Xing <kerneljasonxing@...il.com> wrote:
> > >
> > > On Thu, Nov 7, 2024 at 4:37 PM Eric Dumazet <edumazet@...gle.com> wrote:
> > > >
> > > > On Thu, Nov 7, 2024 at 9:27 AM Jason Xing <kerneljasonxing@...il.com> wrote:
> > > > >
> > > > > On Thu, Nov 7, 2024 at 4:22 PM Philo Lu <lulie@...ux.alibaba.com> wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > On 2024/11/7 16:01, Jason Xing wrote:
> > > > > > > On Thu, Nov 7, 2024 at 3:51 PM Philo Lu <lulie@...ux.alibaba.com> wrote:
> > > > > > >>
> > > > > > >> Hi Jason,
> > > > > > >>
> > > > > > >> On 2024/11/5 10:55, Jason Xing wrote:
> > > > > > >>> From: Jason Xing <kernelxing@...cent.com>
> > > > > > >>>
> > > > > > >>> We found there are rare chances that some RST packets appear during
> > > > > > >>> the shakehands because the timewait socket cannot accept the SYN and
> > > > > > >>> doesn't return TCP_TW_SYN in tcp_timewait_state_process().
> > > > > > >>>
> > > > > > >>> Here is how things happen in production:
> > > > > > >>> Time        Client(A)        Server(B)
> > > > > > >>> 0s          SYN-->
> > > > > > >>> ...
> > > > > > >>> 132s                         <-- FIN
> > > > > > >>> ...
> > > > > > >>> 169s        FIN-->
> > > > > > >>> 169s                         <-- ACK
> > > > > > >>> 169s        SYN-->
> > > > > > >>> 169s                         <-- ACK
> > > > > > >>> 169s        RST-->
> > > > > > >>> As above picture shows, the two flows have a start time difference
> > > > > > >>> of 169 seconds. B starts to send FIN so it will finally enter into
> > > > > > >>> TIMEWAIT state. Nearly at the same time A launches a new connection
> > > > > > >>> that soon is reset by itself due to receiving a ACK.
> > > > > > >>>
> > > > > > >>> There are two key checks in tcp_timewait_state_process() when timewait
> > > > > > >>> socket in B receives the SYN packet:
> > > > > > >>> 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt)
> > > > > > >>> 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0)
> > > > > > >>>
> > > > > > >>> Regarding the first rule, it fails as expected because in the first
> > > > > > >>> connection the seq of SYN sent from A is 1892994276, then 169s have
> > > > > > >>> passed, the second SYN has 239034613 (caused by overflow of s32).
> > > > > > >>>
> > > > > > >>> Then how about the second rule?
> > > > > > >>> It fails again!
> > > > > > >>> Let's take a look at how the tsval comes out:
> > > > > > >>> __tcp_transmit_skb()
> > > > > > >>>       -> tcp_syn_options()
> > > > > > >>>           -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset;
> > > > > > >>> The timestamp depends on two things, one is skb->skb_mstamp_ns, the
> > > > > > >>> other is tp->tsoffset. The latter value is fixed, so we don't need
> > > > > > >>> to care about it. If both operations (sending FIN and then starting
> > > > > > >>> sending SYN) from A happen in 1ms, then the tsval would be the same.
> > > > > > >>> It can be clearly seen in the tcpdump log. Notice that the tsval is
> > > > > > >>> with millisecond precision.
> > > > > > >>>
> > > > > > >>> Based on the above analysis, I decided to make a small change to
> > > > > > >>> the check in tcp_timewait_state_process() so that the second flow
> > > > > > >>> would not fail.
> > > > > > >>>
> > > > > > >>
> > > > > > >> I wonder what a bad result the RST causes. As far as I know, the client
> > > > > > >> will not close the connect and return. Instead, it re-sends an SYN in
> > > > > > >> TCP_TIMEOUT_MIN(2) jiffies (implemented in
> > > > > > >> tcp_rcv_synsent_state_process). So the second connection could still be
> > > > > > >> established successfully, at the cost of a bit more delay. Like:
> > > > > > >>
> > > > > > >>    Time        Client(A)        Server(B)
> > > > > > >>    0s          SYN-->
> > > > > > >>    ...
> > > > > > >>    132s                         <-- FIN
> > > > > > >>    ...
> > > > > > >>    169s        FIN-->
> > > > > > >>    169s                         <-- ACK
> > > > > > >>    169s        SYN-->
> > > > > > >>    169s                         <-- ACK
> > > > > > >>    169s        RST-->
> > > > > > >> ~2jiffies    SYN-->
> > >
> > > It doesn't happen. I dare to say the SYN will not be retransmitted
> > > soon which I can tell from many tcpdump logs I captured.
> >
> > My mind went blank probably because of getting sick. Sorry. Why the
> > tcpdump doesn't show the retransmitted SYN packet is our private
> > kernel missed this following commit:
> >
> > commit 9603d47bad4642118fa19fd1562569663d9235f6
> > Author: SeongJae Park <sjpark@...zon.de>
> > Date:   Sun Feb 2 03:38:26 2020 +0000
> >
> >     tcp: Reduce SYN resend delay if a suspicous ACK is received
> >
> >     When closing a connection, the two acks that required to change closing
> >     socket's status to FIN_WAIT_2 and then TIME_WAIT could be processed in
> >     reverse order.  This is possible in RSS disabled environments such as a
> >     connection inside a host.
>
> Not having a patch from linux-5.6 does not really give a good impression,
> please next time make sure your patches are needed for upstream trees,
> net or net-next.

Sorry about this. Normally, I found an issue on those older kernels
and then I will test it on the net-next tree. The issue this patch
tried to fix can be reproduced, as we all see. But today regarding the
topic about re-sending an SYN quickly, I didn't conduct the test, only
analyzing the code :( Sorry for the noise.

>
> This also means you can write a packetdrill test to show the benefit
> of this patch,
> and send it as a new selftests ;)
>
> Back to packetdrill, and everyone will be happy.

Thanks for reminding me. It's really necessary for me to get started
to learn how to write various packetdrill tests.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ