[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ec3c6315-cbe2-44bd-a84f-f8f140c1d390@intel.com>
Date: Mon, 17 Nov 2025 12:47:24 +0200
From: Adrian Hunter <adrian.hunter@...el.com>
To: Ulf Hansson <ulf.hansson@...aro.org>
CC: Marco Crivellari <marco.crivellari@...e.com>,
<linux-kernel@...r.kernel.org>, <linux-mmc@...r.kernel.org>, Tejun Heo
<tj@...nel.org>, Lai Jiangshan <jiangshanlai@...il.com>, Frederic Weisbecker
<frederic@...nel.org>, Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
Michal Hocko <mhocko@...e.com>
Subject: Re: [PATCH] mmc: core: add WQ_PERCPU to alloc_workqueue users
On 12/11/2025 13:45, Ulf Hansson wrote:
> On Wed, 12 Nov 2025 at 07:49, Adrian Hunter <adrian.hunter@...el.com> wrote:
>>
>> On 11/11/2025 19:12, Ulf Hansson wrote:
>>> + Adrian
>>>
>>> On Fri, 7 Nov 2025 at 15:17, Marco Crivellari <marco.crivellari@...e.com> wrote:
>>>>
>>>> Currently if a user enqueues a work item using schedule_delayed_work() the
>>>> used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
>>>> WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
>>>> schedule_work() that is using system_wq and queue_work(), that makes use
>>>> again of WORK_CPU_UNBOUND.
>>>> This lack of consistency cannot be addressed without refactoring the API.
>>>>
>>>> alloc_workqueue() treats all queues as per-CPU by default, while unbound
>>>> workqueues must opt-in via WQ_UNBOUND.
>>>>
>>>> This default is suboptimal: most workloads benefit from unbound queues,
>>>> allowing the scheduler to place worker threads where they’re needed and
>>>> reducing noise when CPUs are isolated.
>>>>
>>>> This continues the effort to refactor workqueue APIs, which began with
>>>> the introduction of new workqueues and a new alloc_workqueue flag in:
>>>>
>>>> commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
>>>> commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")
>>>>
>>>> This change adds a new WQ_PERCPU flag to explicitly request
>>>> alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified.
>>>>
>>>> With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
>>>> any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
>>>> must now use WQ_PERCPU.
>>>>
>>>> Once migration is complete, WQ_UNBOUND can be removed and unbound will
>>>> become the implicit default.
>>>>
>>>> Suggested-by: Tejun Heo <tj@...nel.org>
>>>> Signed-off-by: Marco Crivellari <marco.crivellari@...e.com>
>>>> ---
>>>> drivers/mmc/core/block.c | 3 ++-
>>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
>>>> index c0ffe0817fd4..6a651ddccf28 100644
>>>> --- a/drivers/mmc/core/block.c
>>>> +++ b/drivers/mmc/core/block.c
>>>> @@ -3275,7 +3275,8 @@ static int mmc_blk_probe(struct mmc_card *card)
>>>> mmc_fixup_device(card, mmc_blk_fixups);
>>>>
>>>> card->complete_wq = alloc_workqueue("mmc_complete",
>>>> - WQ_MEM_RECLAIM | WQ_HIGHPRI, 0);
>>>> + WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_PERCPU,
>>>> + 0);
>>>
>>> I guess we prefer to keep the existing behaviour to avoid breaking
>>> anything, before continuing with the refactoring. Although I think it
>>> should be fine to use WQ_UNBOUND here.
>>>
>>> Looping in Adrian to get his opinion around this.
>>
>> Typically the work is being queued from the CPU that received the
>> interrupt. I'd assume, running the work on that CPU, as we do now,
>> has some merit.
>>
>
> Thanks, I get your point!
>
> So, to me it seems like if we want to explore other options, it would
> require us to do more analysis to avoid introducing performance
> regressions.
>
> BTW, do we know how other block device drivers are dealing with this?
AFAIK, call blk_mq_complete_request() from the interrupt handler.
mmc_block does that in the case of CQE or HSQ.
Powered by blists - more mailing lists