linux-kernel - Re: [PATCH 6/7] workqueue: Implement system-wide max_active enforcement for unbound workqueues

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZYyt821TugsgVx76@mtj.duckdns.org>
Date: Thu, 28 Dec 2023 08:06:27 +0900
From: Tejun Heo <tj@...nel.org>
To: Lai Jiangshan <jiangshanlai@...il.com>
Cc: linux-kernel@...r.kernel.org, Naohiro.Aota@....com,
	Lai Jiangshan <jiangshan.ljs@...group.com>,
	Dennis Dalessandro <dennis.dalessandro@...nelisnetworks.com>
Subject: Re: [PATCH 6/7] workqueue: Implement system-wide max_active
 enforcement for unbound workqueues

Hello, Lai.

On Wed, Dec 27, 2023 at 10:51:42PM +0800, Lai Jiangshan wrote:
>  static int pwq_calculate_max_active(struct pool_workqueue *pwq)
>  {
> +	int pwq_nr_online_cpus;
> +	int max_active;
> +
>  	/*
>  	 * During [un]freezing, the caller is responsible for ensuring
>  	 * that pwq_adjust_max_active() is called at least once after
> @@ -4152,7 +4158,18 @@ static int pwq_calculate_max_active(struct pool_workqueue *pwq)
>  	if ((pwq->wq->flags & WQ_FREEZABLE) && workqueue_freezing)
>  		return 0;
>  
> -	return pwq->wq->saved_max_active;
> +	if (!(pwq->wq->flags & WQ_UNBOUND))
> +		return pwq->wq->saved_max_active;
> +
> +	pwq_nr_online_cpus = cpumask_weight_and(pwq->pool->attrs->__pod_cpumask, cpu_online_mask);
> +	max_active = DIV_ROUND_UP(pwq->wq->saved_max_active * pwq_nr_online_cpus, num_online_cpus());

So, the problem with this approach is that we can end up segmenting
max_active to too many too small pieces. Imagine a system with an AMD EPYC
9754 - 256 threads spread across 16 L3 caches. Let's say there's a workqueue
used for IO (e.g. encryption) with the default CACHE affinity_scope ans
max_active of 2 * nr_cpus, which isn't uncommon for this type of workqueues.

The above code would limit each L3 domain to 32 concurent work items. Let's
say a thread which is pinned to a CPU is issuing a lot of concurrent writes
with the expectation of being able to saturate all the CPUs. It won't be
able to even get close. The expected behavior is saturating all 256 CPUs on
the system. The resulting behavior would be saturating an eight of them.

The crux of the problem is that the desired worker pool domain and
max_active enforcement domain don't match. We want to be fine grained with
the former but pretty close to the whole system for the latter.

Thanks.

-- 
tejun