lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHx7jf_brd2KYyPnMS7pdTUzqm+x8WVToTAo=xRB3fMVGHf1TQ@mail.gmail.com>
Date: Wed, 16 Apr 2025 19:49:30 -0300
From: Luiz Carlos Mourão Paes de Carvalho <luizcmpc@...il.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: Paolo Abeni <pabeni@...hat.com>, netdev@...r.kernel.org, 
	Neal Cardwell <ncardwell@...gle.com>
Subject: Re: [PATCH net] tcp: tcp_acceptable_seq select SND.UNA when SND.WND
 is 0

On Wed, Apr 16, 2025 at 7:37 PM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Wed, Apr 16, 2025 at 3:32 PM Eric Dumazet <edumazet@...gle.com> wrote:
> >
> > On Wed, Apr 16, 2025 at 3:30 PM Luiz Carlos Mourão Paes de Carvalho
> > <luizcmpc@...il.com> wrote:
> > >
> > > On Wed, Apr 16, 2025 at 6:40 PM Eric Dumazet <edumazet@...gle.com> wrote:
> > > >
> > > > On Wed, Apr 16, 2025 at 1:52 PM Luiz Carlos Mourão Paes de Carvalho
> > > > <luizcmpc@...il.com> wrote:
> > > > >
> > > > > Hi Paolo,
> > > > >
> > > > > The dropped ack is a response to data sent by the peer.
> > > > >
> > > > > Peer sends a chunk of data, we ACK with an incorrect SEQ (SND.NXT) that gets dropped
> > > > > by the peer's tcp_sequence function. Connection only advances when we send a RTO.
> > > > >
> > > > > Let me know if the following describes the scenario you expected. I'll add a packetdrill with
> > > > > the expected interaction to the patch if it makes sense.
> > > > >
> > > > > // Tests the invalid SEQs sent by the listener
> > > > > // which are then dropped by the peer.
> > > > >
> > > > > `./common/defaults.sh
> > > > > ./common/set_sysctls.py /proc/sys/net/ipv4/tcp_shrink_window=0`
> > > > >
> > > > >     0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
> > > > >    +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> > > > >    +0 bind(3, ..., ...) = 0
> > > > >    +0 listen(3, 1) = 0
> > > > >
> > > > >    +0 < S 0:0(0) win 8 <mss 1000,sackOK,nop,nop,nop,wscale 7>
> > > > >    +0 > S. 0:0(0) ack 1 <...>
> > > > >   +.1 < . 1:1(0) ack 1 win 8
> > > > >    +0 accept(3, ..., ...) = 4
> > > > >
> > > > >    +0 write(4, ..., 990) = 990
> > > > >    +0 > P. 1:991(990) ack 1
> > > > >    +0 < .  1:1(0) ack 991 win 8           // win=8 despite buffer being almost full, shrink_window=0
> > > > >
> > > > >    +0 write(4, ..., 100) = 100
> > > > >    +0 > P. 991:1091(100) ack 1            // SND.NXT=1091
> > > > >    +0 < .  1:1(0) ack 991 win 0           // failed to queue rx data, RCV.NXT=991, RCV.WND=0
> > > > >
> > > > >  +0.1 < P. 1:1001(1000) ack 901 win 0
> > > >
> > > > This 'ack 901' does not seem right ?
> > >
> > > It's indeed incorrect, the bug still occurs if it were 991. Sorry for that.
> > >
> > > >
> > > > Also your fix would not work if 'win 0' was 'win 1' , and/or if the
> > > > initial wscale was 6 instead of 7 ?
> > >
> > > It indeed does not work if win=1, but that's unlikely to happen unless
> > > you enable shrink_window, and probably
> > > suggests the mentioned loss of precision.
> > >
> > > Now, regarding the scale, it does happen with wscale=6 if your second
> > > write sends < 64 bytes.
> > > This is true with any other scale. Would happen if it were wscale=1
> > > and the second write sent 2 bytes, etc.
> > >
> > > Happens as far as SND.NXT - (SND.UNA + SND.WND) < 1 << wscale.
> > >
> > > >
> > > > >    +0 > .  1091:1091(0) ack 1001          // dropped on tcp_sequence, note that SEQ=1091, while (RCV.NXT + RCV.WND)=991:
> > > > >                                           // if (after(seq, tp->rcv_nxt + tcp_receive_window(tp)))
> > > > >                                           //     return SKB_DROP_REASON_TCP_INVALID_SEQUENCE;
> > > >
> > > > I assume that your patch would change the 1091:1091(0) to 991:991(0) ?
> > >
> > > Precisely.
> > >
> > > >
> > > > It is not clear if there is a bug here... window reneging is outside
> > > > RFC specs unfortunately,
> > > > as hinted in the tcp_acceptable_seq() comments.
> > >
> > > Yeah, that got me thinking as well, but although it isn't covered by
> > > the RFC, the behavior did change since
> > > 8c670bdfa58e ("tcp: correct handling of extreme memory squeeze"),
> > > which is a relatively recent patch (Jan 2025).
> > > Currently, the connection could stall indefinitely, which seems
> > > unwanted. I would be happy to search for other
> > > solutions if you have anything come to mind, though.
> > >
> > > The way I see it, the stack shouldn't be sending invalid ACKs that are
> > > known to be incorrect.
> >
> > These are not ACK, but sequence numbers. They were correct when initially sent.

Yes, I meant invalid ACK packets, with "incorrect" sequence numbers
(more advanced than they should have been for this specific ZeroWindow
scenario). The server has enough knowledge to know what the other peer
expects (the RFC 793 quote in the original message), thus the "known
to be incorrect". I am, however, new to the spec and stack.

>
> You might try to fix the issue on the other side of the connection,
> the one doing reneging...

Would that be adjusting tcp_sequence as per RFC 793, page 69?

        If the RCV.WND is zero, no segments will be acceptable, but
        special allowance should be made to accept valid ACKs, URGs and
        RSTs.

My initial idea was to change tcp_sequence slightly to pass the test
if RCV.WND is 0. My assessment was that it could go against the
mentioned test (SEG.SEQ = RCV.NXT), and would also require some bigger
changes to tcp_validate_incoming as tcp_rcv_established would still
need to drop the SKB but process the ACK. I'd be happy to give it a
shot anyway.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ