netdev - Re: [PATCH net-next] tcp: avoid RST in 3-way shakehands due to failure in tcp_timewait_state

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <75708260-7eb4-42fe-9d9b-605f8eef488b@linux.alibaba.com>
Date: Thu, 7 Nov 2024 16:22:25 +0800
From: Philo Lu <lulie@...ux.alibaba.com>
To: Jason Xing <kerneljasonxing@...il.com>
Cc: davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
 pabeni@...hat.com, dsahern@...nel.org, horms@...nel.org,
 netdev@...r.kernel.org, Jason Xing <kernelxing@...cent.com>
Subject: Re: [PATCH net-next] tcp: avoid RST in 3-way shakehands due to
 failure in tcp_timewait_state_process



On 2024/11/7 16:01, Jason Xing wrote:
> On Thu, Nov 7, 2024 at 3:51 PM Philo Lu <lulie@...ux.alibaba.com> wrote:
>>
>> Hi Jason,
>>
>> On 2024/11/5 10:55, Jason Xing wrote:
>>> From: Jason Xing <kernelxing@...cent.com>
>>>
>>> We found there are rare chances that some RST packets appear during
>>> the shakehands because the timewait socket cannot accept the SYN and
>>> doesn't return TCP_TW_SYN in tcp_timewait_state_process().
>>>
>>> Here is how things happen in production:
>>> Time        Client(A)        Server(B)
>>> 0s          SYN-->
>>> ...
>>> 132s                         <-- FIN
>>> ...
>>> 169s        FIN-->
>>> 169s                         <-- ACK
>>> 169s        SYN-->
>>> 169s                         <-- ACK
>>> 169s        RST-->
>>> As above picture shows, the two flows have a start time difference
>>> of 169 seconds. B starts to send FIN so it will finally enter into
>>> TIMEWAIT state. Nearly at the same time A launches a new connection
>>> that soon is reset by itself due to receiving a ACK.
>>>
>>> There are two key checks in tcp_timewait_state_process() when timewait
>>> socket in B receives the SYN packet:
>>> 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt)
>>> 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0)
>>>
>>> Regarding the first rule, it fails as expected because in the first
>>> connection the seq of SYN sent from A is 1892994276, then 169s have
>>> passed, the second SYN has 239034613 (caused by overflow of s32).
>>>
>>> Then how about the second rule?
>>> It fails again!
>>> Let's take a look at how the tsval comes out:
>>> __tcp_transmit_skb()
>>>       -> tcp_syn_options()
>>>           -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset;
>>> The timestamp depends on two things, one is skb->skb_mstamp_ns, the
>>> other is tp->tsoffset. The latter value is fixed, so we don't need
>>> to care about it. If both operations (sending FIN and then starting
>>> sending SYN) from A happen in 1ms, then the tsval would be the same.
>>> It can be clearly seen in the tcpdump log. Notice that the tsval is
>>> with millisecond precision.
>>>
>>> Based on the above analysis, I decided to make a small change to
>>> the check in tcp_timewait_state_process() so that the second flow
>>> would not fail.
>>>
>>
>> I wonder what a bad result the RST causes. As far as I know, the client
>> will not close the connect and return. Instead, it re-sends an SYN in
>> TCP_TIMEOUT_MIN(2) jiffies (implemented in
>> tcp_rcv_synsent_state_process). So the second connection could still be
>> established successfully, at the cost of a bit more delay. Like:
>>
>>    Time        Client(A)        Server(B)
>>    0s          SYN-->
>>    ...
>>    132s                         <-- FIN
>>    ...
>>    169s        FIN-->
>>    169s                         <-- ACK
>>    169s        SYN-->
>>    169s                         <-- ACK
>>    169s        RST-->
>> ~2jiffies    SYN-->
>>                                 <-- SYN,ACK
> 
> That's exactly what I meant here :) Originally I didn't expect the
> application to relaunch a connection in this case.

s/application/kernel/, right? Because the retry is transparent to user 
applications except the additional latency. I think all of these are 
finished in a single connect() :)

-- 
Philo