lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL+tcoB9a7eKzU9sz8AaY0sqeKn9fkK9ejDJkfh9EpdcG17k-w@mail.gmail.com>
Date: Thu, 7 Nov 2024 11:16:04 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org, 
	pabeni@...hat.com, dsahern@...nel.org, horms@...nel.org
Cc: netdev@...r.kernel.org, Jason Xing <kernelxing@...cent.com>
Subject: Re: [PATCH net-next] tcp: avoid RST in 3-way shakehands due to
 failure in tcp_timewait_state_process

Hello Eric,

On Tue, Nov 5, 2024 at 10:55 AM Jason Xing <kerneljasonxing@...il.com> wrote:
>
> From: Jason Xing <kernelxing@...cent.com>
>
> We found there are rare chances that some RST packets appear during
> the shakehands because the timewait socket cannot accept the SYN and
> doesn't return TCP_TW_SYN in tcp_timewait_state_process().
>
> Here is how things happen in production:
> Time        Client(A)        Server(B)
> 0s          SYN-->
> ...
> 132s                         <-- FIN
> ...
> 169s        FIN-->
> 169s                         <-- ACK
> 169s        SYN-->
> 169s                         <-- ACK

I noticed the above ACK doesn't adhere to RFC 6191. It says:
"If the previous incarnation of the connection used Timestamps, then:
     if ...
     ...
     * Otherwise, silently drop the incoming SYN segment, thus leaving
         the previous incarnation of the connection in the TIME-WAIT
         state.
"
But the timewait socket sends an ACK because of this code snippet:
tcp_timewait_state_process()
    -> // the checks of SYN packet failed.
    -> if (!th->rst) {
        -> return TCP_TW_ACK; // this line can be traced back to 2005

I think the following patch follows the RFC:
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index bb1fe1ba867a..cc22f0412f98 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -231,15 +231,17 @@ tcp_timewait_state_process(struct
inet_timewait_sock *tw, struct sk_buff *skb,
           but not fatal yet.
         */

-       if (th->syn && !th->rst && !th->ack && !paws_reject &&
-           (after(TCP_SKB_CB(skb)->seq, rcv_nxt) ||
-            (tmp_opt.saw_tstamp &&
-             (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0))) {
-               u32 isn = tcptw->tw_snd_nxt + 65535 + 2;
-               if (isn == 0)
-                       isn++;
-               *tw_isn = isn;
-               return TCP_TW_SYN;
+       if (th->syn && !th->rst && !th->ack && !paws_reject) {
+               if (after(TCP_SKB_CB(skb)->seq, rcv_nxt) ||
+                   (tmp_opt.saw_tstamp &&
+                    (s32)(READ_ONCE(tcptw->tw_ts_recent) -
tmp_opt.rcv_tsval) < 0)) {
+                       u32 isn = tcptw->tw_snd_nxt + 65535 + 2;
+                       if (isn == 0)
+                               isn++;
+                       *tw_isn = isn;
+                       return TCP_TW_SYN;
+               }
+               return TCP_TW_SUCCESS;
        }

        if (paws_reject)

Could you help me review this, Eric? Thanks in advance!

Thanks,
Jason

> 169s        RST-->
> As above picture shows, the two flows have a start time difference
> of 169 seconds. B starts to send FIN so it will finally enter into
> TIMEWAIT state. Nearly at the same time A launches a new connection
> that soon is reset by itself due to receiving a ACK.
>
> There are two key checks in tcp_timewait_state_process() when timewait
> socket in B receives the SYN packet:
> 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt)
> 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0)
>
> Regarding the first rule, it fails as expected because in the first
> connection the seq of SYN sent from A is 1892994276, then 169s have
> passed, the second SYN has 239034613 (caused by overflow of s32).
>
> Then how about the second rule?
> It fails again!
> Let's take a look at how the tsval comes out:
> __tcp_transmit_skb()
>     -> tcp_syn_options()
>         -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset;
> The timestamp depends on two things, one is skb->skb_mstamp_ns, the
> other is tp->tsoffset. The latter value is fixed, so we don't need
> to care about it. If both operations (sending FIN and then starting
> sending SYN) from A happen in 1ms, then the tsval would be the same.
> It can be clearly seen in the tcpdump log. Notice that the tsval is
> with millisecond precision.
>
> Based on the above analysis, I decided to make a small change to
> the check in tcp_timewait_state_process() so that the second flow
> would not fail.
>
> Signed-off-by: Jason Xing <kernelxing@...cent.com>
> ---
>  net/ipv4/tcp_minisocks.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
> index bb1fe1ba867a..2b29d1bf5ca0 100644
> --- a/net/ipv4/tcp_minisocks.c
> +++ b/net/ipv4/tcp_minisocks.c
> @@ -234,7 +234,7 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb,
>         if (th->syn && !th->rst && !th->ack && !paws_reject &&
>             (after(TCP_SKB_CB(skb)->seq, rcv_nxt) ||
>              (tmp_opt.saw_tstamp &&
> -             (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0))) {
> +             (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) <= 0))) {
>                 u32 isn = tcptw->tw_snd_nxt + 65535 + 2;
>                 if (isn == 0)
>                         isn++;
> --
> 2.37.3
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ