netdev - Re: [PATCH net] tcp: tcp_acceptable_seq select SND.UNA when SND.WND is 0

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89iKTTapH58UFpF-Ui7JAUOCt1_xin2e0ugMWEgy8vpdgMg@mail.gmail.com>
Date: Wed, 16 Apr 2025 15:37:34 -0700
From: Eric Dumazet <edumazet@...gle.com>
To: Luiz Carlos Mourão Paes de Carvalho <luizcmpc@...il.com>
Cc: Paolo Abeni <pabeni@...hat.com>, netdev@...r.kernel.org, 
	Neal Cardwell <ncardwell@...gle.com>
Subject: Re: [PATCH net] tcp: tcp_acceptable_seq select SND.UNA when SND.WND
 is 0

On Wed, Apr 16, 2025 at 3:32 PM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Wed, Apr 16, 2025 at 3:30 PM Luiz Carlos Mourão Paes de Carvalho
> <luizcmpc@...il.com> wrote:
> >
> > On Wed, Apr 16, 2025 at 6:40 PM Eric Dumazet <edumazet@...gle.com> wrote:
> > >
> > > On Wed, Apr 16, 2025 at 1:52 PM Luiz Carlos Mourão Paes de Carvalho
> > > <luizcmpc@...il.com> wrote:
> > > >
> > > > Hi Paolo,
> > > >
> > > > The dropped ack is a response to data sent by the peer.
> > > >
> > > > Peer sends a chunk of data, we ACK with an incorrect SEQ (SND.NXT) that gets dropped
> > > > by the peer's tcp_sequence function. Connection only advances when we send a RTO.
> > > >
> > > > Let me know if the following describes the scenario you expected. I'll add a packetdrill with
> > > > the expected interaction to the patch if it makes sense.
> > > >
> > > > // Tests the invalid SEQs sent by the listener
> > > > // which are then dropped by the peer.
> > > >
> > > > `./common/defaults.sh
> > > > ./common/set_sysctls.py /proc/sys/net/ipv4/tcp_shrink_window=0`
> > > >
> > > >     0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
> > > >    +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> > > >    +0 bind(3, ..., ...) = 0
> > > >    +0 listen(3, 1) = 0
> > > >
> > > >    +0 < S 0:0(0) win 8 <mss 1000,sackOK,nop,nop,nop,wscale 7>
> > > >    +0 > S. 0:0(0) ack 1 <...>
> > > >   +.1 < . 1:1(0) ack 1 win 8
> > > >    +0 accept(3, ..., ...) = 4
> > > >
> > > >    +0 write(4, ..., 990) = 990
> > > >    +0 > P. 1:991(990) ack 1
> > > >    +0 < .  1:1(0) ack 991 win 8           // win=8 despite buffer being almost full, shrink_window=0
> > > >
> > > >    +0 write(4, ..., 100) = 100
> > > >    +0 > P. 991:1091(100) ack 1            // SND.NXT=1091
> > > >    +0 < .  1:1(0) ack 991 win 0           // failed to queue rx data, RCV.NXT=991, RCV.WND=0
> > > >
> > > >  +0.1 < P. 1:1001(1000) ack 901 win 0
> > >
> > > This 'ack 901' does not seem right ?
> >
> > It's indeed incorrect, the bug still occurs if it were 991. Sorry for that.
> >
> > >
> > > Also your fix would not work if 'win 0' was 'win 1' , and/or if the
> > > initial wscale was 6 instead of 7 ?
> >
> > It indeed does not work if win=1, but that's unlikely to happen unless
> > you enable shrink_window, and probably
> > suggests the mentioned loss of precision.
> >
> > Now, regarding the scale, it does happen with wscale=6 if your second
> > write sends < 64 bytes.
> > This is true with any other scale. Would happen if it were wscale=1
> > and the second write sent 2 bytes, etc.
> >
> > Happens as far as SND.NXT - (SND.UNA + SND.WND) < 1 << wscale.
> >
> > >
> > > >    +0 > .  1091:1091(0) ack 1001          // dropped on tcp_sequence, note that SEQ=1091, while (RCV.NXT + RCV.WND)=991:
> > > >                                           // if (after(seq, tp->rcv_nxt + tcp_receive_window(tp)))
> > > >                                           //     return SKB_DROP_REASON_TCP_INVALID_SEQUENCE;
> > >
> > > I assume that your patch would change the 1091:1091(0) to 991:991(0) ?
> >
> > Precisely.
> >
> > >
> > > It is not clear if there is a bug here... window reneging is outside
> > > RFC specs unfortunately,
> > > as hinted in the tcp_acceptable_seq() comments.
> >
> > Yeah, that got me thinking as well, but although it isn't covered by
> > the RFC, the behavior did change since
> > 8c670bdfa58e ("tcp: correct handling of extreme memory squeeze"),
> > which is a relatively recent patch (Jan 2025).
> > Currently, the connection could stall indefinitely, which seems
> > unwanted. I would be happy to search for other
> > solutions if you have anything come to mind, though.
> >
> > The way I see it, the stack shouldn't be sending invalid ACKs that are
> > known to be incorrect.
>
> These are not ACK, but sequence numbers. They were correct when initially sent.

You might try to fix the issue on the other side of the connection,
the one doing reneging...