lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 11 Jan 2021 09:58:33 -0500
From:   Neal Cardwell <ncardwell@...gle.com>
To:     Enke Chen <enkechen2020@...il.com>
Cc:     Eric Dumazet <edumazet@...gle.com>,
        "David S. Miller" <davem@...emloft.net>,
        Alexey Kuznetsov <kuznet@....inr.ac.ru>,
        Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
        Jakub Kicinski <kuba@...nel.org>,
        Soheil Hassas Yeganeh <soheil@...gle.com>,
        Yuchung Cheng <ycheng@...gle.com>,
        Netdev <netdev@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Jonathan Maxwell <jmaxwell37@...il.com>,
        William McCall <william.mccall@...il.com>, enchen2020@...il.com
Subject: Re: [PATCH] Revert "tcp: simplify window probe aborting on USER_TIMEOUT"

On Fri, Jan 8, 2021 at 11:38 PM Enke Chen <enkechen2020@...il.com> wrote:
>
> From: Enke Chen <enchen@...oaltonetworks.com>
>
> This reverts commit 9721e709fa68ef9b860c322b474cfbd1f8285b0f.
>
> With the commit 9721e709fa68 ("tcp: simplify window probe aborting
> on USER_TIMEOUT"), the TCP session does not terminate with
> TCP_USER_TIMEOUT when data remain untransmitted due to zero window.
>
> The number of unanswered zero-window probes (tcp_probes_out) is
> reset to zero with incoming acks irrespective of the window size,
> as described in tcp_probe_timer():
>
>     RFC 1122 4.2.2.17 requires the sender to stay open indefinitely
>     as long as the receiver continues to respond probes. We support
>     this by default and reset icsk_probes_out with incoming ACKs.
>
> This counter, however, is the wrong one to be used in calculating the
> duration that the window remains closed and data remain untransmitted.
> Thanks to Jonathan Maxwell <jmaxwell37@...il.com> for diagnosing the
> actual issue.
>
> Cc: stable@...r.kernel.org
> Fixes: 9721e709fa68 ("tcp: simplify window probe aborting on USER_TIMEOUT")
> Reported-by: William McCall <william.mccall@...il.com>
> Signed-off-by: Enke Chen <enchen@...oaltonetworks.com>
> ---

I ran this revert commit through our packetdrill TCP tests, and it's
causing failures in a ZWP/USER_TIMEOUT test due to interactions with
this Jan 2019 patch:

    7f12422c4873e9b274bc151ea59cb0cdf9415cf1
    tcp: always timestamp on every skb transmission

The issue seems to be that after 7f12422c4873 the skb->skb_mstamp_ns
is set on every transmit attempt. That means that even skbs that are
not successfully transmitted have a non-zero skb_mstamp_ns. That means
that if ZWPs are repeatedly failing to be sent due to severe local
qdisc congestion, then at this point in the code the start_ts is
always only 500ms in the past (from TCP_RESOURCE_PROBE_INTERVAL =
500ms). That means that if there is severe local qdisc congestion a
USER_TIMEOUT above 500ms is a NOP, and the socket can live far past
the USER_TIMEOUT.

It seems we need a slightly different approach than the revert in this commit.

neal

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ