[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <wosxizbb5z3goikqglsdbrgmshith62upwnavnbqeq5dndfau3@bna46rg3w2ak>
Date: Tue, 14 Oct 2025 13:08:22 +0200
From: Thierry Reding <thierry.reding@...il.com>
To: Marc Zyngier <maz@...nel.org>
Cc: Thomas Gleixner <tglx@...utronix.de>, linux-tegra@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: IRQ thread timeouts and affinity
On Tue, Oct 14, 2025 at 12:50:18PM +0200, Thierry Reding wrote:
> On Sat, Oct 11, 2025 at 11:00:11AM +0100, Marc Zyngier wrote:
> > On Fri, 10 Oct 2025 16:03:01 +0100,
> > Thierry Reding <thierry.reding@...il.com> wrote:
> > >
> > > On Fri, Oct 10, 2025 at 03:18:13PM +0100, Marc Zyngier wrote:
> > > >
> > > > CPU hotplug is the main area of concern, and I'm pretty sure it breaks
> > > > this distribution mechanism (or the other way around). Another thing
> > > > is that if firmware isn't aware that 1:N interrupts can (or should)
> > > > wake-up a CPU from sleep, bad things will happen. Given that nobody
> > > > uses 1:N, you can bet that any bit of privileged SW (TF-A,
> > > > hypervisors) is likely to be buggy (I've already spotted bugs in KVM
> > > > around this).
> > >
> > > Okay, I can find out if CPU hotplug is a common use-case on these
> > > devices, or if we can run some tests with that.
> >
> > It's not so much whether CPU hotplug is of any use to your particular
> > box, but whether this has any detrimental impact on *any* machine
> > doing CPU hotplug.
> >
> > To be clear, this stuff doesn't go in if something breaks, no matter
> > how small.
>
> Of course. I do want to find a way to move forward with this, so I'm
> trying to find ways to check what impact this would have in conjunction
> with CPU hotplug.
>
> I've done some minimal testing on a Tegra264 device where we have less
> CPUs. With your patch applied, I see that most interrupts are nicely
> distributed across CPUs. I'm going to use the serial interrupt as an
> example since it reliably triggers when I test on a system. Here's an
> extract after boot:
>
> # cat /proc/interrupts
> CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
> 25: 42 44 41 29 37 36 39 36 GICv3 547 Level c4e0000.serial
>
> I then took CPU 1 offline:
>
> # echo 0 > /sys/devices/system/cpu/cpu1/online
>
> After that it looks like the GIC automatically reverts to using the
> first CPU, since after a little while:
>
> # cat /proc/interrupts
> CPU0 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
> 25: 186 66 52 64 58 67 62 GICv3 547 Level c4e0000.serial
>
> The interrupt count for CPUs 2-7 no longer increments after taking CPU 1
> offline. Interestingly, bringing CPU 1 back online doesn't have an
> impact, so it doesn't go back to enabling 1:N mode.
Looks like that is because gic_set_affinity() gets called with the new
CPU mask when the CPU goes offline, but it's *not* called when the CPU
comes back online.
Thierry
Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)
Powered by blists - more mailing lists