[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CANn89iL=S+pKz5GDfHR7x6BoFd_-2G_txV4ZXR7AWKeLNAR1HA@mail.gmail.com>
Date: Wed, 16 Apr 2025 16:00:47 -0700
From: Eric Dumazet <edumazet@...gle.com>
To: Luiz Carlos Mourão Paes de Carvalho <luizcmpc@...il.com>
Cc: Paolo Abeni <pabeni@...hat.com>, netdev@...r.kernel.org,
Neal Cardwell <ncardwell@...gle.com>
Subject: Re: [PATCH net] tcp: tcp_acceptable_seq select SND.UNA when SND.WND
is 0
On Wed, Apr 16, 2025 at 3:49 PM Luiz Carlos Mourão Paes de Carvalho
<luizcmpc@...il.com> wrote:
>
> On Wed, Apr 16, 2025 at 7:37 PM Eric Dumazet <edumazet@...gle.com> wrote:
> >
> > On Wed, Apr 16, 2025 at 3:32 PM Eric Dumazet <edumazet@...gle.com> wrote:
> > >
> > > On Wed, Apr 16, 2025 at 3:30 PM Luiz Carlos Mourão Paes de Carvalho
> > > <luizcmpc@...il.com> wrote:
> > > >
> > > > On Wed, Apr 16, 2025 at 6:40 PM Eric Dumazet <edumazet@...gle.com> wrote:
> > > > >
> > > > > On Wed, Apr 16, 2025 at 1:52 PM Luiz Carlos Mourão Paes de Carvalho
> > > > > <luizcmpc@...il.com> wrote:
> > > > > >
> > > > > > Hi Paolo,
> > > > > >
> > > > > > The dropped ack is a response to data sent by the peer.
> > > > > >
> > > > > > Peer sends a chunk of data, we ACK with an incorrect SEQ (SND.NXT) that gets dropped
> > > > > > by the peer's tcp_sequence function. Connection only advances when we send a RTO.
> > > > > >
> > > > > > Let me know if the following describes the scenario you expected. I'll add a packetdrill with
> > > > > > the expected interaction to the patch if it makes sense.
> > > > > >
> > > > > > // Tests the invalid SEQs sent by the listener
> > > > > > // which are then dropped by the peer.
> > > > > >
> > > > > > `./common/defaults.sh
> > > > > > ./common/set_sysctls.py /proc/sys/net/ipv4/tcp_shrink_window=0`
> > > > > >
> > > > > > 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
> > > > > > +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> > > > > > +0 bind(3, ..., ...) = 0
> > > > > > +0 listen(3, 1) = 0
> > > > > >
> > > > > > +0 < S 0:0(0) win 8 <mss 1000,sackOK,nop,nop,nop,wscale 7>
> > > > > > +0 > S. 0:0(0) ack 1 <...>
> > > > > > +.1 < . 1:1(0) ack 1 win 8
> > > > > > +0 accept(3, ..., ...) = 4
> > > > > >
> > > > > > +0 write(4, ..., 990) = 990
> > > > > > +0 > P. 1:991(990) ack 1
> > > > > > +0 < . 1:1(0) ack 991 win 8 // win=8 despite buffer being almost full, shrink_window=0
> > > > > >
> > > > > > +0 write(4, ..., 100) = 100
> > > > > > +0 > P. 991:1091(100) ack 1 // SND.NXT=1091
> > > > > > +0 < . 1:1(0) ack 991 win 0 // failed to queue rx data, RCV.NXT=991, RCV.WND=0
> > > > > >
> > > > > > +0.1 < P. 1:1001(1000) ack 901 win 0
> > > > >
> > > > > This 'ack 901' does not seem right ?
> > > >
> > > > It's indeed incorrect, the bug still occurs if it were 991. Sorry for that.
> > > >
> > > > >
> > > > > Also your fix would not work if 'win 0' was 'win 1' , and/or if the
> > > > > initial wscale was 6 instead of 7 ?
> > > >
> > > > It indeed does not work if win=1, but that's unlikely to happen unless
> > > > you enable shrink_window, and probably
> > > > suggests the mentioned loss of precision.
> > > >
> > > > Now, regarding the scale, it does happen with wscale=6 if your second
> > > > write sends < 64 bytes.
> > > > This is true with any other scale. Would happen if it were wscale=1
> > > > and the second write sent 2 bytes, etc.
> > > >
> > > > Happens as far as SND.NXT - (SND.UNA + SND.WND) < 1 << wscale.
> > > >
> > > > >
> > > > > > +0 > . 1091:1091(0) ack 1001 // dropped on tcp_sequence, note that SEQ=1091, while (RCV.NXT + RCV.WND)=991:
> > > > > > // if (after(seq, tp->rcv_nxt + tcp_receive_window(tp)))
> > > > > > // return SKB_DROP_REASON_TCP_INVALID_SEQUENCE;
> > > > >
> > > > > I assume that your patch would change the 1091:1091(0) to 991:991(0) ?
> > > >
> > > > Precisely.
> > > >
> > > > >
> > > > > It is not clear if there is a bug here... window reneging is outside
> > > > > RFC specs unfortunately,
> > > > > as hinted in the tcp_acceptable_seq() comments.
> > > >
> > > > Yeah, that got me thinking as well, but although it isn't covered by
> > > > the RFC, the behavior did change since
> > > > 8c670bdfa58e ("tcp: correct handling of extreme memory squeeze"),
> > > > which is a relatively recent patch (Jan 2025).
> > > > Currently, the connection could stall indefinitely, which seems
> > > > unwanted. I would be happy to search for other
> > > > solutions if you have anything come to mind, though.
> > > >
> > > > The way I see it, the stack shouldn't be sending invalid ACKs that are
> > > > known to be incorrect.
> > >
> > > These are not ACK, but sequence numbers. They were correct when initially sent.
>
> Yes, I meant invalid ACK packets, with "incorrect" sequence numbers
> (more advanced than they should have been for this specific ZeroWindow
> scenario). The server has enough knowledge to know what the other peer
> expects (the RFC 793 quote in the original message), thus the "known
> to be incorrect". I am, however, new to the spec and stack.
>
> >
> > You might try to fix the issue on the other side of the connection,
> > the one doing reneging...
>
> Would that be adjusting tcp_sequence as per RFC 793, page 69?
>
> If the RCV.WND is zero, no segments will be acceptable, but
> special allowance should be made to accept valid ACKs, URGs and
> RSTs.
>
> My initial idea was to change tcp_sequence slightly to pass the test
> if RCV.WND is 0. My assessment was that it could go against the
> mentioned test (SEG.SEQ = RCV.NXT), and would also require some bigger
> changes to tcp_validate_incoming as tcp_rcv_established would still
> need to drop the SKB but process the ACK. I'd be happy to give it a
> shot anyway.
Please do not focus on RWIN 0, but more generally.
If a peer sent a ACK @seq WIN (@X << wscale), at some point in the past,
then it should accept any SEQ _before_ @seq + (@X << wscale),
especially from a pure ACK packet.
Reneging (ie decrease RWIN) makes sense as a way to deal with memory stress,
but should allow pure acks to be accepted if there sequence is not too
far in the future.
tcp_receive_window(const struct tcp_sock *tp)
...
s32 win = tp->rcv_wup + tp->rcv_wnd - tp->rcv_nxt;
...
if (after(seq, tp->rcv_nxt + tcp_receive_window(tp)))
return SKB_DROP_REASON_TCP_INVALID_SEQUENCE;
-->
if (after(seq, max_admissible_seq(tp))
return SKB_DROP_REASON_TCP_INVALID_SEQUENCE;
Reneging would no retract max_admissible_seq() as far as
tcp_sequence() is concerned.
Powered by blists - more mailing lists