[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <loeliplxuvek4nh4plt4hup3ibqorpiv4eljiiwltgmyqa4nki@xpzymugslcvf>
Date: Thu, 9 Oct 2025 18:05:15 +0200
From: Thierry Reding <thierry.reding@...il.com>
To: Marc Zyngier <maz@...nel.org>
Cc: Thomas Gleixner <tglx@...utronix.de>, linux-tegra@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: IRQ thread timeouts and affinity
On Thu, Oct 09, 2025 at 03:30:56PM +0100, Marc Zyngier wrote:
> Hi Thierry,
>
> On Thu, 09 Oct 2025 12:38:55 +0100,
> Thierry Reding <thierry.reding@...il.com> wrote:
> >
> > Which brings me to the actual question: what is the right way to solve
> > this? I had, maybe naively, assumed that the default CPU affinity, which
> > includes all available CPUs, would be sufficient to have interrupts
> > balanced across all of those CPUs, but that doesn't appear to be the
> > case. At least not with the GIC (v3) driver which selects one CPU (CPU 0
> > in this particular case) from the affinity mask to set the "effective
> > affinity", which then dictates where IRQs are handled and where the
> > corresponding IRQ thread function is run.
>
> There's a (GIC-specific) answer to that, and that's the "1 of N"
> distribution model. The problem is that it is a massive headache (it
> completely breaks with per-CPU context).
Heh, that started out as a very promising first paragraph but turned
ugly very quickly... =)
> We could try and hack this in somehow, but defining a reasonable API
> is complicated. The set of CPUs receiving 1:N interrupts is a *global*
> set, which means you cannot have one interrupt targeting CPUs 0-1, and
> another targeting CPUs 2-3. You can only have a single set for all 1:N
> interrupts. How would you define such a set in a platform agnostic
> manner so that a random driver could use this? I definitely don't want
> to have a GIC-specific API.
I see. I've been thinking that maybe the only way to solve this is using
some sort of policy. A very simple policy might be: use CPU 0 as the
"default" interrupt (much like it is now) because like you said there
might be assumptions built-in that break when the interrupt is scheduled
elsewhere. But then let individual drivers opt into the 1:N set, which
would perhaps span all available CPUs but the first one. From an API PoV
this would just be a flag that's passed to request_irq() (or one of its
derivatives).
> Overall, there is quite a lot of work to be done in this space: the
> machine I'm typing this from doesn't have affinity control *at
> all*. Any interrupt can target any CPU,
Well, that actually sounds pretty nice for the use-case that we have...
> and if Linux doesn't expect
> that, tough.
... but yeah, it may also break things.
> Don't even think of managed interrupts on that sort of
> systems...
I've seen some of the hardware drivers on the Grace devices distribute
interrupts across multiple CPUs, but they do so via managed interrupts
and multiple queues. I was trying to think if maybe that could be used
for cases like QSPI as well. It's similar to just using a fixed CPU
affinity, so it's hardly a great solution. I also didn't see anything
outside of network and PCI use this (there's one exception in SATA),
so I don't know if it's something that just isn't a good idea outside
of multi-queue devices or if simply nobody has considered it.
irqbalance sounds like it would work to avoid the worst, and it has
built-in support to exclude certain CPUs from the balancing set. At the
same time this seems like something that the kernel would be much better
equipped to handle than a userspace daemon. Has anyone ever attempted to
create an irqbalance but within the kernel?
I should probably go look at how this works on x86 or PowerPC systems. I
keep thinking that this cannot be a new problem, so other solutions must
already exist.
Thierry
Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)
Powered by blists - more mailing lists