lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <ZYyt821TugsgVx76@mtj.duckdns.org> Date: Thu, 28 Dec 2023 08:06:27 +0900 From: Tejun Heo <tj@...nel.org> To: Lai Jiangshan <jiangshanlai@...il.com> Cc: linux-kernel@...r.kernel.org, Naohiro.Aota@....com, Lai Jiangshan <jiangshan.ljs@...group.com>, Dennis Dalessandro <dennis.dalessandro@...nelisnetworks.com> Subject: Re: [PATCH 6/7] workqueue: Implement system-wide max_active enforcement for unbound workqueues Hello, Lai. On Wed, Dec 27, 2023 at 10:51:42PM +0800, Lai Jiangshan wrote: > static int pwq_calculate_max_active(struct pool_workqueue *pwq) > { > + int pwq_nr_online_cpus; > + int max_active; > + > /* > * During [un]freezing, the caller is responsible for ensuring > * that pwq_adjust_max_active() is called at least once after > @@ -4152,7 +4158,18 @@ static int pwq_calculate_max_active(struct pool_workqueue *pwq) > if ((pwq->wq->flags & WQ_FREEZABLE) && workqueue_freezing) > return 0; > > - return pwq->wq->saved_max_active; > + if (!(pwq->wq->flags & WQ_UNBOUND)) > + return pwq->wq->saved_max_active; > + > + pwq_nr_online_cpus = cpumask_weight_and(pwq->pool->attrs->__pod_cpumask, cpu_online_mask); > + max_active = DIV_ROUND_UP(pwq->wq->saved_max_active * pwq_nr_online_cpus, num_online_cpus()); So, the problem with this approach is that we can end up segmenting max_active to too many too small pieces. Imagine a system with an AMD EPYC 9754 - 256 threads spread across 16 L3 caches. Let's say there's a workqueue used for IO (e.g. encryption) with the default CACHE affinity_scope ans max_active of 2 * nr_cpus, which isn't uncommon for this type of workqueues. The above code would limit each L3 domain to 32 concurent work items. Let's say a thread which is pinned to a CPU is issuing a lot of concurrent writes with the expectation of being able to saturate all the CPUs. It won't be able to even get close. The expected behavior is saturating all 256 CPUs on the system. The resulting behavior would be saturating an eight of them. The crux of the problem is that the desired worker pool domain and max_active enforcement domain don't match. We want to be fine grained with the former but pretty close to the whole system for the latter. Thanks. -- tejun
Powered by blists - more mailing lists