netdev - Re: [PATCH net-next 1/2] tcp: do not set a zero size receive buffer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89i+eLqKvv1mF6N8-5DrQZZfRJrfopps0w9HRMANn_w=1QA@mail.gmail.com>
Date: Mon, 21 Jul 2025 05:41:48 -0700
From: Eric Dumazet <edumazet@...gle.com>
To: Paolo Abeni <pabeni@...hat.com>
Cc: netdev@...r.kernel.org, Neal Cardwell <ncardwell@...gle.com>, 
	Kuniyuki Iwashima <kuniyu@...gle.com>, "David S. Miller" <davem@...emloft.net>, 
	David Ahern <dsahern@...nel.org>, Jakub Kicinski <kuba@...nel.org>, Simon Horman <horms@...nel.org>, 
	Matthieu Baerts <matttbe@...nel.org>
Subject: Re: [PATCH net-next 1/2] tcp: do not set a zero size receive buffer

On Mon, Jul 21, 2025 at 5:30 AM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Mon, Jul 21, 2025 at 3:50 AM Paolo Abeni <pabeni@...hat.com> wrote:
> >
> > On 7/21/25 10:04 AM, Eric Dumazet wrote:
> > > On Fri, Jul 18, 2025 at 10:25 AM Paolo Abeni <pabeni@...hat.com> wrote:
> > >>
> > >> The nipa CI is reporting frequent failures in the mptcp_connect
> > >> self-tests.
> > >>
> > >> In the failing scenarios (TCP -> MPTCP) the involved sockets are
> > >> actually plain TCP ones, as fallback for passive socket at 2whs
> > >> time cause the MPTCP listener to actually create a TCP socket.
> > >>
> > >> The transfer is stuck due to the receiver buffer being zero.
> > >> With the stronger check in place, tcp_clamp_window() can be invoked
> > >> while the TCP socket has sk_rmem_alloc == 0, and the receive buffer
> > >> will be zeroed, too.
> > >>
> > >> Pass to tcp_clamp_window() even the current skb truesize, so that
> > >> such helper could compute and use the actual limit enforced by
> > >> the stack.
> > >>
> > >> Fixes: 1d2fbaad7cd8 ("tcp: stronger sk_rcvbuf checks")
> > >> Signed-off-by: Paolo Abeni <pabeni@...hat.com>
> > >> ---
> > >>  net/ipv4/tcp_input.c | 12 ++++++------
> > >>  1 file changed, 6 insertions(+), 6 deletions(-)
> > >>
> > >> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > >> index 672cbfbdcec1..c98de02a3c57 100644
> > >> --- a/net/ipv4/tcp_input.c
> > >> +++ b/net/ipv4/tcp_input.c
> > >> @@ -610,24 +610,24 @@ static void tcp_init_buffer_space(struct sock *sk)
> > >>  }
> > >>
> > >>  /* 4. Recalculate window clamp after socket hit its memory bounds. */
> > >> -static void tcp_clamp_window(struct sock *sk)
> > >> +static void tcp_clamp_window(struct sock *sk, int truesize)
> > >
> > >
> > > I am unsure about this one. truesize can be 1MB here, do we want that
> > > in general ?
> >
> > I'm unsure either. But I can't think of a different approach?!? If the
> > incoming truesize is 1M the socket should allow for at least 1M rcvbuf
> > size to accept it, right?
>
> What I meant was :
>
> This is the generic point, accepting skb->truesize as additional input
> here would make us more vulnerable, or we could risk other
> regressions.
>
> The question is : why does MPTCP end up here in the first place.
> Perhaps an older issue with an incorrectly sized sk_rcvbuf ?
>
> Or maybe the test about the receive queue being empty (currently done
> in tcp_data_queue()) should be moved to a more strategic place.

If the MPTCP part can not be easily resolved, perhaps :

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 672cbfbdcec1de22a5b1494d365863303271d222..81b6d37708120632d16a50892442ea04779cc3a4
100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5549,6 +5549,10 @@ static int tcp_prune_queue(struct sock *sk,
const struct sk_buff *in_skb)
 {
        struct tcp_sock *tp = tcp_sk(sk);

+       /* Do nothing if our queues are empty. */
+       if (!atomic_read(&sk->sk_rmem_alloc))
+               return -1;
+
        NET_INC_STATS(sock_net(sk), LINUX_MIB_PRUNECALLED);

        if (!tcp_can_ingest(sk, in_skb))