lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250127150303.46c9d9f5@elisabeth>
Date: Mon, 27 Jan 2025 15:03:03 +0100
From: Stefano Brivio <sbrivio@...hat.com>
To: Menglong Dong <menglong8.dong@...il.com>
Cc: Jason Xing <kerneljasonxing@...il.com>, Jon Maloy <jmaloy@...hat.com>,
 Eric Dumazet <edumazet@...gle.com>, Neal Cardwell <ncardwell@...gle.com>,
 netdev@...r.kernel.org, davem@...emloft.net, kuba@...nel.org,
 passt-dev@...st.top, lvivier@...hat.com, dgibson@...hat.com,
 eric.dumazet@...il.com
Subject: Re: [net,v2] tcp: correct handling of extreme memory squeeze

On Mon, 27 Jan 2025 21:37:23 +0800
Menglong Dong <menglong8.dong@...il.com> wrote:

> On Mon, Jan 27, 2025 at 6:32 PM Stefano Brivio <sbrivio@...hat.com> wrote:
> >
> > On Mon, 27 Jan 2025 18:17:28 +0800
> > Jason Xing <kerneljasonxing@...il.com> wrote:
> >  
> > > I'm not that sure if it's a bug belonging to the Linux kernel.  
> >
> > It is, because for at least 20-25 years (before that it's a bit hard to
> > understand from history) a non-zero window would be announced, as
> > obviously expected, once there's again space in the receive window.  
> 
> Sorry for the late reply. I think the key of this problem is
> what should we do when we receive a tcp packet and we are
> out of memory.
> 
> The RFC doesn't define such a thing,

Why not? RFC 9293, 3.8.6:

  There is an assumption that this is related to the data buffer space
  currently available for this connection.

That is, out-of-memory -> zero window.

> so in the commit
> e2142825c120 ("net: tcp: send zero-window ACK when no memory"),
> I reply with a zero-window ACK to the peer.

Your patch is fundamentally correct, nobody is disputing that. The
problem is that it introduces a side effect because it gets the notion
of "current window" out of sync by sending a one-off packet with a
zero-window, without recording that.

> And the peer will keep
> probing the window by retransmitting the packet that we dropped if
> the peer is a LINUX SYSTEM.
> 
> As I said, the RFC doesn't define such a case, so the behavior of
> the peer is undefined if it is not a LINUX SYSTEM. If the peer doesn't
> keep retransmitting the packet, it will hang the connection, just like
> the problem that described in this commit log.

It's not undefined. RFC 9293 3.8.6.1 (just like RFC 1122 4.2.2.17,
RFC 793 3.7) requires zero-window probes.

But keeping the window closed indefinitely if there's no zero-window
probe is a regression anyway:

- a retransmission timeout must elapse (RFC 9293 3.8.1) before the
  zero-window probe is sent, so relying on zero-window probes means
  introducing an unnecessary delay

- if the peer (as it was the case here) fails to send a zero-window
  probe for whatever reason, things break. This is a userspace
  breakage, regardless of the fact that the peer should send a
  zero-window probe

> However, we can make some optimization to make it more
> adaptable. We can send a ACK with the right window to the
> peer when the memory is available, and __tcp_cleanup_rbuf()
> is a good choice.
> 
> Generally speaking, I think this patch makes sense. However,
> I'm not sure if there is any other influence if we make
> "tp->rcv_wnd=0", but it can trigger a ACK in __tcp_cleanup_rbuf().

I don't understand what's your concern with the patch that was proposed
(and tested quite thoroughly, by the way).

> Following is the code that I thought before to optimize this
> case (the code is totally not tested):
>
> [...]

-- 
Stefano


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ