linux-kernel - Re: [PATCH] genirq/manage: Reduce priority of forced secondary IRQ handler

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87348g95yd.ffs@tglx>
Date: Sat, 20 Sep 2025 23:20:26 +0200
From: Thomas Gleixner <tglx@...utronix.de>
To: Lukas Wunner <lukas@...ner.de>, Sebastian Andrzej Siewior
 <bigeasy@...utronix.de>, Crystal Wood <crwood@...hat.com>, Ingo Molnar
 <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, Juri Lelli
 <juri.lelli@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>,
 Clark Williams <clrkwllms@...nel.org>, Steven Rostedt
 <rostedt@...dmis.org>
Cc: Dietmar Eggemann <dietmar.eggemann@....com>, Ben Segall
 <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, Valentin Schneider
 <vschneid@...hat.com>, linux-kernel@...r.kernel.org, Attila Fazekas
 <afazekas@...hat.com>, linux-pci@...r.kernel.org,
 linux-rt-devel@...ts.linux.dev, Bjorn Helgaas <helgaas@...nel.org>, Mahesh
 J Salgaonkar <mahesh@...ux.ibm.com>, Oliver OHalloran <oohall@...il.com>
Subject: Re: [PATCH] genirq/manage: Reduce priority of forced secondary IRQ
 handler

On Mon, Sep 08 2025 at 18:08, Lukas Wunner wrote:
> Crystal reports the PCIe Advanced Error Reporting driver getting stuck
> in an infinite loop on PREEMPT_RT:  Both the primary IRQ handler

Can you please write out interrupt. This IRQ acronym is just not helpful
at all.

> aer_irq() as well as the secondary handler aer_isr() are forced into
> threads with identical priority.
>
> Crystal writes that on the ARM system in question, the primary IRQ
> handler has to clear an error in the Root Error Status register...
>
>    "before the next error happens, or else the hardware will set the
>     Multiple ERR_COR Received bit.  If that bit is set, then aer_isr()
>     can't rely on the Error Source Identification register, so it scans
>     through all devices looking for errors -- and for some reason, on
>     this system, accessing the AER registers (or any Config Space above
>     0x400, even though there are capabilities located there) generates
>     an Unsupported Request Error (but returns valid data).  Since this
>     happens more than once, without aer_irq() preempting, it causes
>     another multi error and we get stuck in a loop."
>
> The issue does not show on non-PREEMPT_RT since the primary IRQ handler
> runs in hardirq context and thus clears the Root Error Status register
> before waking the secondary IRQ handler's thread.

That sentence does not make sense at all because on RT this is not any
different. The primary handler clears the status register before waking
the secondary handler, no?

What's different on non-RT is that the primary handler can interrupt the
threaded secondary handler, which is required to make progress.

> Simulate the same behavior on PREEMPT_RT by assigning a lower default
> priority to forced secondary IRQ handlers.

I'm not really fond of this new minus one interface naming. It's a
horrible misnomer as it suggests that it sets the FIFO priority to -1,
which doesn't make any sense at all and is confusing at best.

Can you please pick a name, which makes it obvious what this is about
and as this is really a special case for forced secondary interrupt
handlers, it should just reflect that in the name.

I obviously understand that the proposed change squashs the whole class
of similar (not yet detected) issues, but that made me look at that
particular instance nevertheless.

All aer_irq() does is reading two PCI config words, writing one and then
sticking 64bytes into a KFIFO. All of that is hard interrupt safe. So
arguably this AER problem can be nicely solved by the below one-liner,
no?

And because that made me curious I looked at the first 40 instances of
interrupt requests which have a primary and a secondary handler.

   21 of them can simply set IRQF_NO_THREAD

    3 of them are just silly. I really could not resist to create the
      diff and append it below for entertainment.

Which means going through all the 231 instances is definitely a
worthwhile exercise.

The rest:

   1) is not immediately obvious as I did not follow (indirect)
      call chains to figure out what's really behind them

   2) sets IRQF_ONESHOT, which means the interrupt line is masked before
      the primary handler runs and stays masked until the secondary
      handler returns.

#1 needs eyeballs

#2 should actually not use the secondary thread mechanism at all when
   forced threaded. This really should do:

   hardirq()
        wake(primary_thread);

   primary_thread()
        if (primary_handler() == IRQ_WAKE_THREAD) {
              if (secondary_handler && ONESHOT)
                 secondary_handler();
              else
                 wake(secondary_thread);

   That spares a boat load of wakeups and scheduling on RT and reduces
   the actual problem space significantly. This might need a new
   IRQF_xxx flag to avoid overloading ONESHOT, but that's an
   implementation detail.

Hmm?

Thanks,

        tglx
---
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1671,7 +1671,7 @@ static int aer_probe(struct pcie_device
 	set_service_data(dev, rpc);
 
 	status = devm_request_threaded_irq(device, dev->irq, aer_irq, aer_isr,
-					   IRQF_SHARED, "aerdrv", dev);
+					   IRQF_NO_THREAD | IRQF_SHARED, "aerdrv", dev);
 	if (status) {
 		pci_err(port, "request AER IRQ %d failed\n", dev->irq);
 		return status;

<ENTERTAINMENT>

--- a/drivers/bluetooth/btintel_pcie.c
+++ b/drivers/bluetooth/btintel_pcie.c
@@ -1482,11 +1482,6 @@ static void btintel_pcie_msix_rx_handle(
 	}
 }
 
-static irqreturn_t btintel_pcie_msix_isr(int irq, void *data)
-{
-	return IRQ_WAKE_THREAD;
-}
-
 static inline bool btintel_pcie_is_rxq_empty(struct btintel_pcie_data *data)
 {
 	return data->ia.cr_hia[BTINTEL_PCIE_RXQ_NUM] == data->ia.cr_tia[BTINTEL_PCIE_RXQ_NUM];
@@ -1585,13 +1580,9 @@ static int btintel_pcie_setup_irq(struct
 		msix_entry = &data->msix_entries[i];
 		msix_entry->vector = pci_irq_vector(data->pdev, i);
 
-		err = devm_request_threaded_irq(&data->pdev->dev,
-						msix_entry->vector,
-						btintel_pcie_msix_isr,
-						btintel_pcie_irq_msix_handler,
-						IRQF_SHARED,
-						KBUILD_MODNAME,
-						msix_entry);
+		err = devm_request_threaded_irq(&data->pdev->dev, msix_entry->vector,
+						NULL, btintel_pcie_irq_msix_handler,
+						IRQF_SHARED, KBUILD_MODNAME, msix_entry);
 		if (err) {
 			pci_free_irq_vectors(data->pdev);
 			data->alloc_vecs = 0;
--- a/drivers/bus/fsl-mc/dprc-driver.c
+++ b/drivers/bus/fsl-mc/dprc-driver.c
@@ -381,17 +381,6 @@ int dprc_scan_container(struct fsl_mc_de
 EXPORT_SYMBOL_GPL(dprc_scan_container);
 
 /**
- * dprc_irq0_handler - Regular ISR for DPRC interrupt 0
- *
- * @irq_num: IRQ number of the interrupt being handled
- * @arg: Pointer to device structure
- */
-static irqreturn_t dprc_irq0_handler(int irq_num, void *arg)
-{
-	return IRQ_WAKE_THREAD;
-}
-
-/**
  * dprc_irq0_handler_thread - Handler thread function for DPRC interrupt 0
  *
  * @irq_num: IRQ number of the interrupt being handled
@@ -525,10 +514,8 @@ static int register_dprc_irq_handler(str
 	 * NOTE: devm_request_threaded_irq() invokes the device-specific
 	 * function that programs the MSI physically in the device
 	 */
-	error = devm_request_threaded_irq(&mc_dev->dev,
-					  irq->virq,
-					  dprc_irq0_handler,
-					  dprc_irq0_handler_thread,
+	error = devm_request_threaded_irq(&mc_dev->dev, irq->virq,
+					  NULL, dprc_irq0_handler_thread,
 					  IRQF_NO_SUSPEND | IRQF_ONESHOT,
 					  dev_name(&mc_dev->dev),
 					  &mc_dev->dev);
--- a/drivers/gpu/drm/arm/malidp_hw.c
+++ b/drivers/gpu/drm/arm/malidp_hw.c
@@ -1356,11 +1356,6 @@ void malidp_se_irq_hw_init(struct malidp
 			     hwdev->hw->map.se_irq_map.irq_mask);
 }
 
-static irqreturn_t malidp_se_irq_thread_handler(int irq, void *arg)
-{
-	return IRQ_HANDLED;
-}
-
 int malidp_se_irq_init(struct drm_device *drm, int irq)
 {
 	struct malidp_drm *malidp = drm_to_malidp(drm);
@@ -1371,8 +1366,7 @@ int malidp_se_irq_init(struct drm_device
 	malidp_hw_disable_irq(hwdev, MALIDP_SE_BLOCK, 0xffffffff);
 	malidp_hw_clear_irq(hwdev, MALIDP_SE_BLOCK, 0xffffffff);
 
-	ret = devm_request_threaded_irq(drm->dev, irq, malidp_se_irq,
-					malidp_se_irq_thread_handler,
+	ret = devm_request_threaded_irq(drm->dev, irq, malidp_se_irq, NULL,
 					IRQF_SHARED, "malidp-se", drm);
 	if (ret < 0) {
 		DRM_ERROR("failed to install SE IRQ handler\n");