netdev - Re: [PATCH net-next 1/2] tcp: do not set a zero size receive buffer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89i+KCsw+LH1X1yzmgr1wg5Vxm47AbAEOeOnY5gqq4ngH4w@mail.gmail.com>
Date: Mon, 21 Jul 2025 01:04:12 -0700
From: Eric Dumazet <edumazet@...gle.com>
To: Paolo Abeni <pabeni@...hat.com>
Cc: netdev@...r.kernel.org, Neal Cardwell <ncardwell@...gle.com>, 
	Kuniyuki Iwashima <kuniyu@...gle.com>, "David S. Miller" <davem@...emloft.net>, 
	David Ahern <dsahern@...nel.org>, Jakub Kicinski <kuba@...nel.org>, Simon Horman <horms@...nel.org>, 
	Matthieu Baerts <matttbe@...nel.org>
Subject: Re: [PATCH net-next 1/2] tcp: do not set a zero size receive buffer

On Fri, Jul 18, 2025 at 10:25 AM Paolo Abeni <pabeni@...hat.com> wrote:
>
> The nipa CI is reporting frequent failures in the mptcp_connect
> self-tests.
>
> In the failing scenarios (TCP -> MPTCP) the involved sockets are
> actually plain TCP ones, as fallback for passive socket at 2whs
> time cause the MPTCP listener to actually create a TCP socket.
>
> The transfer is stuck due to the receiver buffer being zero.
> With the stronger check in place, tcp_clamp_window() can be invoked
> while the TCP socket has sk_rmem_alloc == 0, and the receive buffer
> will be zeroed, too.
>
> Pass to tcp_clamp_window() even the current skb truesize, so that
> such helper could compute and use the actual limit enforced by
> the stack.
>
> Fixes: 1d2fbaad7cd8 ("tcp: stronger sk_rcvbuf checks")
> Signed-off-by: Paolo Abeni <pabeni@...hat.com>
> ---
>  net/ipv4/tcp_input.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 672cbfbdcec1..c98de02a3c57 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -610,24 +610,24 @@ static void tcp_init_buffer_space(struct sock *sk)
>  }
>
>  /* 4. Recalculate window clamp after socket hit its memory bounds. */
> -static void tcp_clamp_window(struct sock *sk)
> +static void tcp_clamp_window(struct sock *sk, int truesize)


I am unsure about this one. truesize can be 1MB here, do we want that
in general ?

I am unsure why MPTCP ends up with this path.

 LINUX_MIB_PRUNECALLED being called in normal MPTCP operations seems
strange to me.


>  {
>         struct tcp_sock *tp = tcp_sk(sk);
>         struct inet_connection_sock *icsk = inet_csk(sk);
>         struct net *net = sock_net(sk);
> -       int rmem2;
> +       int rmem2, needed;
>
>         icsk->icsk_ack.quick = 0;
>         rmem2 = READ_ONCE(net->ipv4.sysctl_tcp_rmem[2]);
> +       needed = atomic_read(&sk->sk_rmem_alloc) + truesize;
>
>         if (sk->sk_rcvbuf < rmem2 &&
>             !(sk->sk_userlocks & SOCK_RCVBUF_LOCK) &&
>             !tcp_under_memory_pressure(sk) &&
>             sk_memory_allocated(sk) < sk_prot_mem_limits(sk, 0)) {
> -               WRITE_ONCE(sk->sk_rcvbuf,
> -                          min(atomic_read(&sk->sk_rmem_alloc), rmem2));
> +               WRITE_ONCE(sk->sk_rcvbuf, min(needed, rmem2));
>         }
> -       if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf)
> +       if (needed > sk->sk_rcvbuf)
>                 tp->rcv_ssthresh = min(tp->window_clamp, 2U * tp->advmss);
>  }
>
> @@ -5552,7 +5552,7 @@ static int tcp_prune_queue(struct sock *sk, const struct sk_buff *in_skb)
>         NET_INC_STATS(sock_net(sk), LINUX_MIB_PRUNECALLED);
>
>         if (!tcp_can_ingest(sk, in_skb))
> -               tcp_clamp_window(sk);
> +               tcp_clamp_window(sk, in_skb->truesize);
>         else if (tcp_under_memory_pressure(sk))
>                 tcp_adjust_rcv_ssthresh(sk);
>
> --
> 2.50.0
>