[<prev] [next>] [day] [month] [year] [list]
Message-ID: <6f0e6f03-4b9d-db84-6465-e810ea849731@amd.com>
Date: Wed, 24 Aug 2022 16:36:08 -0400
From: Felix Kuehling <felix.kuehling@....com>
To: Hillf Danton <hdanton@...a.com>
Cc: Tejun Heo <tj@...nel.org>, LKML <linux-kernel@...r.kernel.org>,
Dave Airlie <airlied@...il.com>
Subject: Re: Selecting CPUs for queuing work on
On 2022-08-24 07:33, Hillf Danton wrote:
> On Fri, 12 Aug 2022 16:54:04 -0400 Felix Kuehling wrote:
>> On 2022-08-12 16:30, Tejun Heo wrote:
>>> On Fri, Aug 12, 2022 at 04:26:47PM -0400, Felix Kuehling wrote:
>>>> Hi workqueue maintainers,
>>>>
>>>> In the KFD (amdgpu) driver we found a need to schedule bottom half interrupt
>>>> handlers on CPU cores different from the one where the top-half interrupt
>>>> handler runs to avoid the interrupt handler stalling the bottom half in
>>>> extreme scenarios. See my latest patch that tries to use a different
>>>> hyperthread on the same CPU core, or falls back to a different core in the
>>>> same NUMA node if that fails:
>>>> https://lore.kernel.org/all/20220811190433.1213179-1-Felix.Kuehling@amd.com/
>>>>
>>>> Dave pointed out that the driver may not be the best place to implement such
>>>> logic and suggested that we should have an abstraction, maybe in the
>>>> workqueue code. Do you feel this is something that could or should be
>>>> provided by the core workqueue code? Or maybe some other place?
>>> I'm not necessarily against it. I guess it can be a flag on an unbound wq.
>>> Do the interrupts move across different CPUs tho? ie. why does this need to
>>> be a dynamic decision?
>> In principle, I think IRQ routing to CPUs can change dynamically with
>> irqbalance.
>>
>> If this were a flag, would there be a way to ensure all work queued to
>> the same workqueue from the same CPU, or maybe all work associated with
>> a work_struct always goes to the same CPU? One of the reasons for my
>> latest patch was to get more predictable scheduling of the work to cores
>> that are specifically reserved for interrupt handling by the system
>> admin. This minimizes CPU scheduling noise that can compound to cause
>> real performance issues in large scale distributed applications.
>>
>> What we need is kind of the opposite of WQ_UNBOUND. As I understand it,
>> WQ_UNBOUND can schedule anywhere to maximize concurrency. What we need
>> is to schedule to very specific, predictable CPUs. We only have one work
>> item per GPU that processes all the interrupts in order, so we don't
>> need the concurrency of WQ_UNBOUND.
> Given irq dynamically routed to CPUs, any test results showing that unbound
> WQ is a bad option?
If we're using an unbound WQ, we'd need some control over which CPUs
will execute the bottom half. The customer wants to minimize noise, so
they want all the interrupt processing on dedicated CPU cores that are
not used for application threads. I read a little more about interrupt
scheduling. I see that there is a CPU mask for housekeeping tasks. I
haven't found where that is configured yet. But maybe an unbound WQ
using a housekeeping_cpumask would do the trick.
The problem is that it's very hard to get test results. It takes very
large application runs to see the impact of scheduling bottom halves on
different cores. And the customer like to reboot their cluster with
1000s of nodes. For now they may have found another cause for the noise,
and addressing that may be good enough. If our current solution for
scheduling the bottom half turns out to be good enough, they will have
even less interest in investigating this further.
I should know in a week or two, whether I'll pursue this further, or
drop it.
Regards,
Felix
Powered by blists - more mailing lists