[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAAVpQUA8cN3-+sK4eUPknHHkepyTjb_vDWJA6PPXkz8=+Q7UoA@mail.gmail.com>
Date: Mon, 28 Jul 2025 09:29:58 -0700
From: Kuniyuki Iwashima <kuniyu@...gle.com>
To: Fedor Pchelkin <pchelkin@...ras.ru>
Cc: "David S. Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Eric Dumazet <edumazet@...gle.com>, Simon Horman <horms@...nel.org>,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
lvc-project@...uxtesting.org, stable@...r.kernel.org
Subject: Re: [PATCH net] netlink: avoid infinite retry looping in netlink_unicast()
On Mon, Jul 28, 2025 at 1:07 AM Fedor Pchelkin <pchelkin@...ras.ru> wrote:
>
> netlink_attachskb() checks for the socket's read memory allocation
> constraints. Firstly, it has:
>
> rmem < READ_ONCE(sk->sk_rcvbuf)
>
> to check if the just increased rmem value fits into the socket's receive
> buffer. If not, it proceeds and tries to wait for the memory under:
>
> rmem + skb->truesize > READ_ONCE(sk->sk_rcvbuf)
>
> The checks don't cover the case when skb->truesize + sk->sk_rmem_alloc is
> equal to sk->sk_rcvbuf. Thus the function neither successfully accepts
> these conditions, nor manages to reschedule the task - and is called in
> retry loop for indefinite time which is caught as:
>
> rcu: INFO: rcu_sched self-detected stall on CPU
> rcu: 0-....: (25999 ticks this GP) idle=ef2/1/0x4000000000000000 softirq=262269/262269 fqs=6212
> (t=26000 jiffies g=230833 q=259957)
> NMI backtrace for cpu 0
> CPU: 0 PID: 22 Comm: kauditd Not tainted 5.10.240 #68
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-4.fc42 04/01/2014
> Call Trace:
> <IRQ>
> dump_stack lib/dump_stack.c:120
> nmi_cpu_backtrace.cold lib/nmi_backtrace.c:105
> nmi_trigger_cpumask_backtrace lib/nmi_backtrace.c:62
> rcu_dump_cpu_stacks kernel/rcu/tree_stall.h:335
> rcu_sched_clock_irq.cold kernel/rcu/tree.c:2590
> update_process_times kernel/time/timer.c:1953
> tick_sched_handle kernel/time/tick-sched.c:227
> tick_sched_timer kernel/time/tick-sched.c:1399
> __hrtimer_run_queues kernel/time/hrtimer.c:1652
> hrtimer_interrupt kernel/time/hrtimer.c:1717
> __sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1113
> asm_call_irq_on_stack arch/x86/entry/entry_64.S:808
> </IRQ>
>
> netlink_attachskb net/netlink/af_netlink.c:1234
> netlink_unicast net/netlink/af_netlink.c:1349
> kauditd_send_queue kernel/audit.c:776
> kauditd_thread kernel/audit.c:897
> kthread kernel/kthread.c:328
> ret_from_fork arch/x86/entry/entry_64.S:304
>
> Restore the original behavior of the check which commit in Fixes
> accidentally missed when restructuring the code.
>
> Found by Linux Verification Center (linuxtesting.org).
>
> Fixes: ae8f160e7eb2 ("netlink: Fix wraparounds of sk->sk_rmem_alloc.")
> Cc: stable@...r.kernel.org
> Signed-off-by: Fedor Pchelkin <pchelkin@...ras.ru>
> ---
>
> Similar rmem and sk->sk_rcvbuf comparing pattern in
> netlink_broadcast_deliver() accepts these values being equal, while
> the netlink_dump() case does not - but it goes all the way down to
> f9c2288837ba ("netlink: implement memory mapped recvmsg()") and looks
> like an irrelevant issue without any real consequences. Though might be
> cleaned up if needed.
This should be fixed in a separate patch. It's weird that one path
accepts a value == SO_RCVBUF but the other path doesn't.
Reviewed-by: Kuniyuki Iwashima <kuniyu@...gle.com>
Thanks!
>
> Updated sk->sk_rmem_alloc vs sk->sk_rcvbuf checks throughout the kernel
> diverse in treating the corner case of them being equal.
>
> net/netlink/af_netlink.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> index 6332a0e06596..0fc3f045fb65 100644
> --- a/net/netlink/af_netlink.c
> +++ b/net/netlink/af_netlink.c
> @@ -1218,7 +1218,7 @@ int netlink_attachskb(struct sock *sk, struct sk_buff *skb,
> nlk = nlk_sk(sk);
> rmem = atomic_add_return(skb->truesize, &sk->sk_rmem_alloc);
>
> - if ((rmem == skb->truesize || rmem < READ_ONCE(sk->sk_rcvbuf)) &&
> + if ((rmem == skb->truesize || rmem <= READ_ONCE(sk->sk_rcvbuf)) &&
> !test_bit(NETLINK_S_CONGESTED, &nlk->state)) {
> netlink_skb_set_owner_r(skb, sk);
> return 0;
> --
> 2.50.1
>
Powered by blists - more mailing lists