[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <j7ikmaazu6hjzsagqqk4o4nnxl5wupsmpcaruoyytsn2ogolyx@mtmhqrkm4gbv>
Date: Thu, 9 Oct 2025 13:38:55 +0200
From: Thierry Reding <thierry.reding@...il.com>
To: Thomas Gleixner <tglx@...utronix.de>, Marc Zyngier <maz@...nel.org>
Cc: linux-tegra@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org
Subject: IRQ thread timeouts and affinity
Hi Thomas, Marc, all,
Apologies up front for the length of this. There are a lot of details
that I want to share in order to, hopefully, make this as clear as
possible.
We've been running into an issue on some systems (NVIDIA Grace chips)
where either during boot or at runtime, CPU 0 can be under very high
load and cause some IRQ thread functions to be delayed to a point where
we encounter the timeout in the work submission parts of the driver.
Specifically this happens for the Tegra QSPI controller driver found
in drivers/spi/spi-tegra210-quad.c. This driver uses an IRQ thread to
wait for and process "transfer ready" interrupts (which need to run
DMA transfers or copy from the hardware FIFOs using PIO to get the
SPI transfer data). Under heavy load, we've seen the IRQ thread run
with up to multiple seconds of delay.
One solution that we've tried is to move parts of the IRQ handler into
the hard IRQ portion, and we observed that that interrupt is always seen
within the expected period of time. However, the IRQ thread still runs
very late in those cases.
To mitigate this, we're currently trying to gracefully recover on time-
out by checking the hardware state and processing as if no timeout
happened. This needs special care because eventually the IRQ thread will
run and try to process a SPI transfer that's already been processed. It
also isn't optimal because of, well, the timeout.
These devices have a *lot* of CPUs and usually only CPU 0 tends to be
clogged (during boot) and fio-based stress tests at runtime can also
trigger this case if they happen to run on CPU 0.
One workaround that has proven to work is to change the affinity of the
QSPI interrupt to whatever the current CPU is at probe time. That only
only works as long as that CPU doesn't happen to be CPU 0, obviously.
It also doesn't work if we end up stress-testing the selected CPU at
runtime, so it's ultimately just a way of reducing the likelihood, but
not avoiding the problems entirely.
Which brings me to the actual question: what is the right way to solve
this? I had, maybe naively, assumed that the default CPU affinity, which
includes all available CPUs, would be sufficient to have interrupts
balanced across all of those CPUs, but that doesn't appear to be the
case. At least not with the GIC (v3) driver which selects one CPU (CPU 0
in this particular case) from the affinity mask to set the "effective
affinity", which then dictates where IRQs are handled and where the
corresponding IRQ thread function is run.
One potential solution I see is to avoid threaded IRQs for this because
they will cause all of the interrupts to be processed on CPU 0 by
default. A viable alternative would be to use work queues, which, to my
understanding, can (will?) be scheduled more flexibly.
Alternatively, would it be possible (and make sense) to make the IRQ
core code schedule threads across more CPUs? Is there a particular
reason that the IRQ thread runs on the same CPU that services the IRQ?
Maybe another way would be to "reserve" CPU 0 for the type of core OS
driver like QSPI (the TPM is connected to this controller) and make sure
all CPU intensive tasks do not run on that CPU?
I know that things like irqbalance and taskset exist to solve some of
these problems, but they do not work when we hit these cases at boot
time.
Any other solutions that I haven't thought of?
Thanks,
Thierry
Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)
Powered by blists - more mailing lists