linux-kernel - Re: [PATCH] tcp: fix TCP_USER

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20210114000949.GC3738@localhost.localdomain>
Date:   Wed, 13 Jan 2021 16:09:49 -0800
From:   Enke Chen <enkechen2020@...il.com>
To:     Yuchung Cheng <ycheng@...gle.com>
Cc:     Eric Dumazet <edumazet@...gle.com>,
        "David S. Miller" <davem@...emloft.net>,
        Alexey Kuznetsov <kuznet@....inr.ac.ru>,
        Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
        Jakub Kicinski <kuba@...nel.org>,
        netdev <netdev@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Jonathan Maxwell <jmaxwell37@...il.com>,
        William McCall <william.mccall@...il.com>
Subject: Re: [PATCH] tcp: fix TCP_USER_TIMEOUT with zero window

Yes, I am convinced :-) Thanks to Eric, Neal and Yuchung for their help.

-- Enke

On Wed, Jan 13, 2021 at 01:20:55PM -0800, Yuchung Cheng wrote:
> On Wed, Jan 13, 2021 at 12:49 PM Eric Dumazet <edumazet@...gle.com> wrote:
> >
> > On Wed, Jan 13, 2021 at 9:12 PM Enke Chen <enkechen2020@...il.com> wrote:
> > >
> > > From: Enke Chen <enchen@...oaltonetworks.com>
> > >
> > > The TCP session does not terminate with TCP_USER_TIMEOUT when data
> > > remain untransmitted due to zero window.
> > >
> > > The number of unanswered zero-window probes (tcp_probes_out) is
> > > reset to zero with incoming acks irrespective of the window size,
> > > as described in tcp_probe_timer():
> > >
> > >     RFC 1122 4.2.2.17 requires the sender to stay open indefinitely
> > >     as long as the receiver continues to respond probes. We support
> > >     this by default and reset icsk_probes_out with incoming ACKs.
> > >
> > > This counter, however, is the wrong one to be used in calculating the
> > > duration that the window remains closed and data remain untransmitted.
> > > Thanks to Jonathan Maxwell <jmaxwell37@...il.com> for diagnosing the
> > > actual issue.
> > >
> > > In this patch a separate counter is introduced to track the number of
> > > zero-window probes that are not answered with any non-zero window ack.
> > > This new counter is used in determining when to abort the session with
> > > TCP_USER_TIMEOUT.
> > >
> >
> > I think one possible issue would be that local congestion (full qdisc)
> > would abort early,
> > because tcp_model_timeout() assumes linear backoff.
> Yes exactly. if ZWPs are dropped due to local congestion, the
> model_timeout computes incorrectly. Therefore having a starting
> timestamp is the surest way b/c it does not assume any specific
> backoff behavior.
> 
> >
> > Neal or Yuchung can further comment on that, it is late for me in France.
> >
> > packetdrill test would be :
> >
> >    0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
> >    +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> >    +0 bind(3, ..., ...) = 0
> >    +0 listen(3, 1) = 0
> >
> >
> >    +0 < S 0:0(0) win 0 <mss 1460>
> >    +0 > S. 0:0(0) ack 1 <mss 1460>
> >
> >   +.1 < . 1:1(0) ack 1 win 65530
> >    +0 accept(3, ..., ...) = 4
> >
> >    +0 setsockopt(4, SOL_TCP, TCP_USER_TIMEOUT, [3000], 4) = 0
> >    +0 write(4, ..., 24) = 24
> >    +0 > P. 1:25(24) ack 1
> >    +.1 < . 1:1(0) ack 25 win 65530
> >    +0 %{ assert tcpi_probes == 0, tcpi_probes; \
> >          assert tcpi_backoff == 0, tcpi_backoff }%
> >
> > // install a qdisc dropping all packets
> >    +0 `tc qdisc delete dev tun0 root 2>/dev/null ; tc qdisc add dev
> > tun0 root pfifo limit 0`
> >    +0 write(4, ..., 24) = 24
> >    // When qdisc is congested we retry every 500ms therefore in theory
> >    // we'd retry 6 times before hitting 3s timeout. However, since we
> >    // estimate the elapsed time based on exp backoff of actual RTO (300ms),
> >    // we'd bail earlier with only 3 probes.
> >    +2.1 write(4, ..., 24) = -1
> >    +0 %{ assert tcpi_probes == 3, tcpi_probes; \
> >          assert tcpi_backoff == 0, tcpi_backoff }%
> >    +0 close(4) = 0
> >