linux-kernel - Re: [PATCH v4 09/10] workqueue: Implement system-wide nr_active enforcement for unbound workqueues

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240131041205.GA3517117@dev-arch.thelio-3990X>
Date: Tue, 30 Jan 2024 21:12:05 -0700
From: Nathan Chancellor <nathan@...nel.org>
To: Tejun Heo <tj@...nel.org>
Cc: Marek Szyprowski <m.szyprowski@...sung.com>,
	Lai Jiangshan <jiangshanlai@...il.com>,
	linux-kernel@...r.kernel.org, Naohiro.Aota@....com,
	kernel-team@...a.com
Subject: Re: [PATCH v4 09/10] workqueue: Implement system-wide nr_active
 enforcement for unbound workqueues

Hi Tejun,

On Tue, Jan 30, 2024 at 06:02:52PM -1000, Tejun Heo wrote:
> Hello,
> 
> Thanks for the report. Can you please test whether the following patch fixes
> the problem?

I just tested this change on top of 5797b1c18919 but it does not appear
to resolve the issue for any of the three configurations that I tested.

Cheers,
Nathan

> ----- 8< -----
> From: Tejun Heo <tj@...nel.org>
> Subject: workqueue: Fix crash due to premature NUMA topology access on some archs
> 
> System workqueues are allocated early during boot from
> workqueue_init_early(). While allocating unbound workqueues,
> wq_update_node_max_active() is invoked from apply_workqueue_attrs() and
> accesses NUMA topology information - cpumask_of_node() and cpu_to_node().
> 
> At this point, topology information is not initialized yet and on arm and
> some other archs, it leads to an oops like the following:
> 
>   Unable to handle kernel paging request at virtual address ffff0002100296e0
>   Mem abort info:
>      ESR = 0x0000000096000005
>      EC = 0x25: DABT (current EL), IL = 32 bits
>      SET = 0, FnV = 0
>      EA = 0, S1PTW = 0
>      FSC = 0x05: level 1 translation fault
>   Data abort info:
>      ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
>      CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>      GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>   swapper pgtable: 4k pages, 48-bit VAs, pgdp=000000000255a000
>   [ffff0002100296e0] pgd=18000001ffff7003, p4d=18000001ffff7003, 
>   pud=0000000000000000
>   Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
>   Modules linked in:
>   CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0-rc2-next-20240130+ #14392
>   Hardware name: Hardkernel ODROID-M1 (DT)
>   pstate: 600000c9 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>   pc : wq_update_node_max_active+0x50/0x1fc
>   lr : wq_update_node_max_active+0x1f0/0x1fc
>   ...
>   Call trace:
>     wq_update_node_max_active+0x50/0x1fc
>     apply_wqattrs_commit+0xf0/0x114
>     apply_workqueue_attrs_locked+0x58/0xa0
>     alloc_workqueue+0x5ac/0x774
>     workqueue_init_early+0x460/0x540
>     start_kernel+0x258/0x684
>     __primary_switched+0xb8/0xc0
>   Code: 9100a273 35000d01 53067f00 d0016dc1 (f8607a60)
>   ---[ end trace 0000000000000000 ]---
>   Kernel panic - not syncing: Attempted to kill the idle task!
>   ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
> 
> Fix it by initializing wq->node_nr_active[].max to WQ_DFL_MIN_ACTIVE on
> allocation and making wq_update_node_max_active() noop until
> workqueue_init_topology(). Note that workqueue_init_topology() invokes
> wq_update_node_max_active() on all unbound workqueues, so the end result is
> still the same.
> 
> Signed-off-by: Tejun Heo <tj@...nel.org>
> Reported-by: Marek Szyprowski <m.szyprowski@...sung.com>
> Reported-by: Nathan Chancellor <nathan@...nel.org>
> Link: http://lkml.kernel.org/r/91eacde0-df99-4d5c-a980-91046f66e612@samsung.com
> Fixes: 5797b1c18919 ("workqueue: Implement system-wide nr_active enforcement for unbound workqueues")
> ---
>  kernel/workqueue.c |    8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 9221a4c57ae1..a65081ec6780 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -386,6 +386,8 @@ static const char *wq_affn_names[WQ_AFFN_NR_TYPES] = {
>  	[WQ_AFFN_SYSTEM]		= "system",
>  };
>  
> +static bool wq_topo_initialized = false;
> +
>  /*
>   * Per-cpu work items which run for longer than the following threshold are
>   * automatically considered CPU intensive and excluded from concurrency
> @@ -1510,6 +1512,9 @@ static void wq_update_node_max_active(struct workqueue_struct *wq, int off_cpu)
>  
>  	lockdep_assert_held(&wq->mutex);
>  
> +	if (!wq_topo_initialized)
> +		return;
> +
>  	if (!cpumask_test_cpu(off_cpu, effective))
>  		off_cpu = -1;
>  
> @@ -4356,6 +4361,7 @@ static void free_node_nr_active(struct wq_node_nr_active **nna_ar)
>  
>  static void init_node_nr_active(struct wq_node_nr_active *nna)
>  {
> +	nna->max = WQ_DFL_MIN_ACTIVE;
>  	atomic_set(&nna->nr, 0);
>  	raw_spin_lock_init(&nna->lock);
>  	INIT_LIST_HEAD(&nna->pending_pwqs);
> @@ -7400,6 +7406,8 @@ void __init workqueue_init_topology(void)
>  	init_pod_type(&wq_pod_types[WQ_AFFN_CACHE], cpus_share_cache);
>  	init_pod_type(&wq_pod_types[WQ_AFFN_NUMA], cpus_share_numa);
>  
> +	wq_topo_initialized = true;
> +
>  	mutex_lock(&wq_pool_mutex);
>  
>  	/*