linux-kernel - Re: [PATCH 09/24] workqueue: Make unbound workqueues to use per-cpu pool

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3b8222aa-ac58-6a69-ac36-dfdc66f4d66e@cornelisnetworks.com>
Date:   Mon, 22 May 2023 08:27:52 -0400
From:   Dennis Dalessandro <dennis.dalessandro@...nelisnetworks.com>
To:     Leon Romanovsky <leon@...nel.org>, Tejun Heo <tj@...nel.org>
Cc:     jiangshanlai@...il.com, torvalds@...ux-foundation.org,
        peterz@...radead.org, linux-kernel@...r.kernel.org,
        kernel-team@...a.com, joshdon@...gle.com, brho@...gle.com,
        briannorris@...omium.org, nhuck@...gle.com, agk@...hat.com,
        snitzer@...nel.org, void@...ifault.com,
        Jason Gunthorpe <jgg@...pe.ca>,
        Karsten Graul <kgraul@...ux.ibm.com>,
        Wenjia Zhang <wenjia@...ux.ibm.com>,
        Jan Karcher <jaka@...ux.ibm.com>
Subject: Re: [PATCH 09/24] workqueue: Make unbound workqueues to use per-cpu
 pool_workqueues

On 5/22/23 2:41 AM, Leon Romanovsky wrote:
> On Thu, May 18, 2023 at 02:16:54PM -1000, Tejun Heo wrote:
>> A pwq (pool_workqueue) represents an association between a workqueue and a
>> worker_pool. When a work item is queued, the workqueue selects the pwq to
>> use, which in turn determines the pool, and queues the work item to the pool
>> through the pwq. pwq is also what implements the maximum concurrency limit -
>> @max_active.
>>
>> As a per-cpu workqueue should be assocaited with a different worker_pool on
>> each CPU, it always had per-cpu pwq's that are accessed through wq->cpu_pwq.
>> However, unbound workqueues were sharing a pwq within each NUMA node by
>> default. The sharing has several downsides:
>>
>> * Because @max_active is per-pwq, the meaning of @max_active changes
>>   depending on the machine configuration and whether workqueue NUMA locality
>>   support is enabled.
>>
>> * Makes per-cpu and unbound code deviate.
>>
>> * Gets in the way of making workqueue CPU locality awareness more flexible.
>>
>> This patch makes unbound workqueues use per-cpu pwq's the same way per-cpu
>> workqueues do by making the following changes:
>>
>> * wq->numa_pwq_tbl[] is removed and unbound workqueues now use wq->cpu_pwq
>>   just like per-cpu workqueues. wq->cpu_pwq is now RCU protected for unbound
>>   workqueues.
>>
>> * numa_pwq_tbl_install() is renamed to install_unbound_pwq() and installs
>>   the specified pwq to the target CPU's wq->cpu_pwq.
>>
>> * apply_wqattrs_prepare() now always allocates a separate pwq for each CPU
>>   unless the workqueue is ordered. If ordered, all CPUs use wq->dfl_pwq.
>>   This makes the return value of wq_calc_node_cpumask() unnecessary. It now
>>   returns void.
>>
>> * @max_active now means the same thing for both per-cpu and unbound
>>   workqueues. WQ_UNBOUND_MAX_ACTIVE now equals WQ_MAX_ACTIVE and
>>   documentation is updated accordingly. WQ_UNBOUND_MAX_ACTIVE is no longer
>>   used in workqueue implementation and will be removed later.
>>
>> * All unbound pwq operations which used to be per-numa-node are now per-cpu.
>>
>> For most unbound workqueue users, this shouldn't cause noticeable changes.
>> Work item issue and completion will be a small bit faster, flush_workqueue()
>> would become a bit more expensive, and the total concurrency limit would
>> likely become higher. All @max_active==1 use cases are currently being
>> audited for conversion into alloc_ordered_workqueue() and they shouldn't be
>> affected once the audit and conversion is complete.
>>
>> One area where the behavior change may be more noticeable is
>> workqueue_congested() as the reported congestion state is now per CPU
>> instead of NUMA node. There are only two users of this interface -
>> drivers/infiniband/hw/hfi1 and net/smc. Maintainers of both subsystems are
>> cc'd. Inputs on the behavior change would be very much appreciated.
> 
> At least for hfi1, it seems like your changes won't cause to any
> differences as NUMA node is expected to be connected to closest CPU
> anyway in setups relevant to hfi1.
> 
> Dennis, am I right?
> 
> Thanks

I can see there being an impact as to when things are considered congested since
it's now CPU based vs NUMA. However, this seems like it's a good thing for hfi1.
The purpose of the code in hfi1 is to decide if QP processing should yield the
CPU and allow other QPs to make progress.

Acked-by: Dennis Dalessandro <dennis.dalessandro@...nelisnetworks.com>