[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZnPrkcWkAxQrODju@calendula>
Date: Thu, 20 Jun 2024 10:42:57 +0200
From: Pablo Neira Ayuso <pablo@...filter.org>
To: Mikhail Gavrilov <mikhail.v.gavrilov@...il.com>
Cc: kadlec@...filter.org, nnamrec@...il.com,
Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
Linux regressions mailing list <regressions@...ts.linux.dev>,
netdev@...r.kernel.org
Subject: Re: 6.10/bisected/regression - commits 4e7aaa6b82d6 cause appearing
WARNING at net/netfilter/ipset/ip_set_core.c:1200 suspicious
rcu_dereference_protected() usage
Hi,
A fix has been included in this batch:
https://lore.kernel.org/netfilter-devel/20240619170537.2846-1-pablo@netfilter.org/T/#m39095851d6fc9d1a04d9de66f5580f61c9593389
On Thu, Jun 20, 2024 at 11:55:05AM +0500, Mikhail Gavrilov wrote:
> Hi,
> between 2ef5971ff345 and rc4 I spotted a new regression.
> It is expressed in the appearance warning with stacktrace after one
> minute after boot.
>
> =============================
> WARNING: suspicious RCU usage
> 6.10.0-0.rc4.20240618git14d7c92f8df9.40.fc41.x86_64+debug #1 Tainted:
> G W L ------- ---
> -----------------------------
> net/netfilter/ipset/ip_set_core.c:1200 suspicious
> rcu_dereference_protected() usage!
>
> other info that might help us debug this:
>
>
> rcu_scheduler_active = 2, debug_locks = 1
> 3 locks held by kworker/u128:1/264:
> #0: ffff88810813c958 ((wq_completion)netns){+.+.}-{0:0}, at:
> process_one_work+0xeab/0x1460
> #1: ffffc90001477da0 (net_cleanup_work){+.+.}-{0:0}, at:
> process_one_work+0x82b/0x1460
> #2: ffffffff97d9ae98 (pernet_ops_rwsem){++++}-{3:3}, at:
> cleanup_net+0xb9/0xa90
>
> stack backtrace:
> CPU: 30 PID: 264 Comm: kworker/u128:1 Tainted: G W L
> ------- --- 6.10.0-0.rc4.20240618git14d7c92f8df9.40.fc41.x86_64+debug
> #1
> Hardware name: ASUS System Product Name/ROG STRIX B650E-I GAMING
> WIFI, BIOS 2611 04/07/2024
> Workqueue: netns cleanup_net
> Call Trace:
> <TASK>
> dump_stack_lvl+0x84/0xd0
> lockdep_rcu_suspicious.cold+0xa1/0x134
> _destroy_all_sets+0x1c7/0x560 [ip_set]
> ip_set_net_exit+0x20/0x50 [ip_set]
> ops_exit_list+0x99/0x170
> cleanup_net+0x4d9/0xa90
> ? __pfx_cleanup_net+0x10/0x10
> process_one_work+0x8a4/0x1460
> ? worker_thread+0xe3/0x1010
> ? __pfx_process_one_work+0x10/0x10
> ? assign_work+0x16c/0x240
> worker_thread+0x5e6/0x1010
> ? __kthread_parkme+0xb1/0x1d0
> ? __pfx_worker_thread+0x10/0x10
> ? __pfx_worker_thread+0x10/0x10
> kthread+0x2d2/0x3a0
> ? _raw_spin_unlock_irq+0x28/0x60
> ? __pfx_kthread+0x10/0x10
> ret_from_fork+0x31/0x70
> ? __pfx_kthread+0x10/0x10
> ret_from_fork_asm+0x1a/0x30
> </TASK>
>
> =============================
> WARNING: suspicious RCU usage
> 6.10.0-0.rc4.20240618git14d7c92f8df9.40.fc41.x86_64+debug #1 Tainted:
> G W L ------- ---
> -----------------------------
> net/netfilter/ipset/ip_set_core.c:1211 suspicious
> rcu_dereference_protected() usage!
>
> other info that might help us debug this:
>
>
> rcu_scheduler_active = 2, debug_locks = 1
> 3 locks held by kworker/u128:1/264:
> #0: ffff88810813c958 ((wq_completion)netns){+.+.}-{0:0}, at:
> process_one_work+0xeab/0x1460
> #1: ffffc90001477da0 (net_cleanup_work){+.+.}-{0:0}, at:
> process_one_work+0x82b/0x1460
> #2: ffffffff97d9ae98 (pernet_ops_rwsem){++++}-{3:3}, at:
> cleanup_net+0xb9/0xa90
>
> stack backtrace:
> CPU: 30 PID: 264 Comm: kworker/u128:1 Tainted: G W L
> ------- --- 6.10.0-0.rc4.20240618git14d7c92f8df9.40.fc41.x86_64+debug
> #1
> Hardware name: ASUS System Product Name/ROG STRIX B650E-I GAMING
> WIFI, BIOS 2611 04/07/2024
> Workqueue: netns cleanup_net
> Call Trace:
> <TASK>
> dump_stack_lvl+0x84/0xd0
> lockdep_rcu_suspicious.cold+0xa1/0x134
> _destroy_all_sets+0x3a8/0x560 [ip_set]
> ip_set_net_exit+0x20/0x50 [ip_set]
> ops_exit_list+0x99/0x170
> cleanup_net+0x4d9/0xa90
> ? __pfx_cleanup_net+0x10/0x10
> process_one_work+0x8a4/0x1460
> ? worker_thread+0xe3/0x1010
> ? __pfx_process_one_work+0x10/0x10
> ? assign_work+0x16c/0x240
> worker_thread+0x5e6/0x1010
> ? __kthread_parkme+0xb1/0x1d0
> ? __pfx_worker_thread+0x10/0x10
> ? __pfx_worker_thread+0x10/0x10
> kthread+0x2d2/0x3a0
> ? _raw_spin_unlock_irq+0x28/0x60
> ? __pfx_kthread+0x10/0x10
> ret_from_fork+0x31/0x70
> ? __pfx_kthread+0x10/0x10
> ret_from_fork_asm+0x1a/0x30
> </TASK>
> workqueue: gc_worker [nf_conntrack] hogged CPU for >10000us 7 times,
> consider switching to WQ_UNBOUND
> workqueue: gc_worker [nf_conntrack] hogged CPU for >10000us 11 times,
> consider switching to WQ_UNBOUND
>
> Bisect blame this commit:
> commit 4e7aaa6b82d63e8ddcbfb56b4fd3d014ca586f10
> Author: Jozsef Kadlecsik <kadlec@...filter.org>
> Date: Tue Jun 4 15:58:03 2024 +0200
>
> netfilter: ipset: Fix race between namespace cleanup and gc in the
> list:set type
>
> Lion Ackermann reported that there is a race condition between
> namespace cleanup
> in ipset and the garbage collection of the list:set type. The namespace
> cleanup can destroy the list:set type of sets while the gc of the
> set type is
> waiting to run in rcu cleanup. The latter uses data from the
> destroyed set which
> thus leads use after free. The patch contains the following parts:
>
> - When destroying all sets, first remove the garbage collectors, then wait
> if needed and then destroy the sets.
> - Fix the badly ordered "wait then remove gc" for the destroy a single set
> case.
> - Fix the missing rcu locking in the list:set type in the userspace test
> case.
> - Use proper RCU list handlings in the list:set type.
>
> The patch depends on c1193d9bbbd3 (netfilter: ipset: Add list
> flush to cancel_gc).
>
> Fixes: 97f7cf1cd80e (netfilter: ipset: fix performance regression
> in swap operation)
> Reported-by: Lion Ackermann <nnamrec@...il.com>
> Tested-by: Lion Ackermann <nnamrec@...il.com>
> Signed-off-by: Jozsef Kadlecsik <kadlec@...filter.org>
> Signed-off-by: Pablo Neira Ayuso <pablo@...filter.org>
>
> net/netfilter/ipset/ip_set_core.c | 81
> ++++++++++++++++++++++++++++++++++++++++++++++-----------------------------------
> net/netfilter/ipset/ip_set_list_set.c | 30 ++++++++++++++----------------
> 2 files changed, 60 insertions(+), 51 deletions(-)
>
> And I can confirm after reverting 4e7aaa6b82d6 the issue is gone.
>
> I also attach the build config and full kernel log.
>
> My hardware specs: https://linux-hardware.org/?probe=80512f0c04
>
> Jozsef can you look into this please?
>
> --
> Best Regards,
> Mike Gavrilov.
Powered by blists - more mailing lists