netdev - Re: [net,v2] tcp: correct handling of extreme memory squeeze

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CADVnQymiwUG3uYBGMc1ZEV9vAUQzEOD4ymdN7Rcqi7yAK9ZB5A@mail.gmail.com>
Date: Sat, 18 Jan 2025 15:04:20 -0500
From: Neal Cardwell <ncardwell@...gle.com>
To: jmaloy@...hat.com
Cc: netdev@...r.kernel.org, davem@...emloft.net, kuba@...nel.org, 
	passt-dev@...st.top, sbrivio@...hat.com, lvivier@...hat.com, 
	dgibson@...hat.com, imagedong@...cent.com, eric.dumazet@...il.com, 
	edumazet@...gle.com
Subject: Re: [net,v2] tcp: correct handling of extreme memory squeeze

On Fri, Jan 17, 2025 at 4:41 PM <jmaloy@...hat.com> wrote:
>
> From: Jon Maloy <jmaloy@...hat.com>
>
> Testing with iperf3 using the "pasta" protocol splicer has revealed
> a bug in the way tcp handles window advertising in extreme memory
> squeeze situations.
>
> Under memory pressure, a socket endpoint may temporarily advertise
> a zero-sized window, but this is not stored as part of the socket data.
> The reasoning behind this is that it is considered a temporary setting
> which shouldn't influence any further calculations.
>
> However, if we happen to stall at an unfortunate value of the current
> window size, the algorithm selecting a new value will consistently fail
> to advertise a non-zero window once we have freed up enough memory.

The "if we happen to stall at an unfortunate value of the current
window size" phrase is a little vague... :-) Do you have a sense of
what might count as "unfortunate" here? That might help in crafting a
packetdrill test to reproduce this and have an automated regression
test.

> This means that this side's notion of the current window size is
> different from the one last advertised to the peer, causing the latter
> to not send any data to resolve the sitution.

Since the peer last saw a zero receive window at the time of the
memory-pressure drop, shouldn't the peer be sending repeated zero
window probes, and shouldn't the local host respond to a ZWP with an
ACK with the correct non-zero window?

Do you happen to have a tcpdump .pcap of one of these cases that you can share?

> The problem occurs on the iperf3 server side, and the socket in question
> is a completely regular socket with the default settings for the
> fedora40 kernel. We do not use SO_PEEK or SO_RCVBUF on the socket.
>
> The following excerpt of a logging session, with own comments added,
> shows more in detail what is happening:
>
> //              tcp_v4_rcv(->)
> //                tcp_rcv_established(->)
> [5201<->39222]:     ==== Activating log @ net/ipv4/tcp_input.c/tcp_data_queue()/5257 ====
> [5201<->39222]:     tcp_data_queue(->)
> [5201<->39222]:        DROPPING skb [265600160..265665640], reason: SKB_DROP_REASON_PROTO_MEM
>                        [rcv_nxt 265600160, rcv_wnd 262144, snt_ack 265469200, win_now 131184]

What is "win_now"? That doesn't seem to correspond to any variable
name in the Linux source tree.  Can this be renamed to the
tcp_select_window() variable it is printing, like "cur_win" or
"effective_win" or "new_win", etc?

Or perhaps you can attach your debugging patch in some email thread? I
agree with Eric that these debug dumps are a little hard to parse
without seeing the patch that allows us to understand what some of
these fields are...

I agree with Eric that probably tp->pred_flags should be cleared, and
a packetdrill test for this would be super-helpful.

thanks,
neal