lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251220193120.3339162-1-lrizzo@google.com>
Date: Sat, 20 Dec 2025 19:31:19 +0000
From: Luigi Rizzo <lrizzo@...gle.com>
To: tglx@...utronix.de
Cc: bhelgaas@...gle.com, linux-kernel@...r.kernel.org, maz@...nel.org, 
	Luigi Rizzo <lrizzo@...gle.com>
Subject: Re: [patch 1/2] irqchip/msi-lib: Honor the MSI_FLAG_PCI_MSI_MASK_PARENT
 flag

There are platforms (including some ARM SoC) where the MSIx
writes are a performance killer, because they are exceedingly
serializing on the PCIe root port.

These platforms are the key motivation for Global Software
Interrupt Moderation (GSIM) which relies on actually masking
device interrupts so the MSIx writes are not generated.
https://lore.kernel.org/all/20251217112128.1401896-1-lrizzo@google.com/

Overriding mask/unmask with irq_chip_mask_parent() makes software
moderation ineffective. GSIM works great on ARM platforms before
this patch, but becomes ineffective afterwards, e.g. on linux 6.18.

The round trip through the PCI endpoint for mask_irq(), caused by the
readback to make sure the PCI write has been sent, is almost always
(or really always) unnecessary.  Masking is inherently racy; waiting
that the PCIe write has arrived at the device won't guarantee that an
interrupt has arrived in the meantime, so there is really no benefit
in the readback (which, for instance, can be conditionally removed with
code like the one below).

I measured the cost of pci_irq_mask_msix() and it goes from 1000-1500ns
with the readl(), down to 40-50ns without it.

Once we remove the costly readback, is there any remaining reason
to overwrite [un]mask_irq() with irq_chip_[un]mask_parent() ?
 
cheers
luigi

--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -17,6 +17,8 @@

 int pci_msi_enable = 1;

+DEFINE_PER_CPU(bool, pci_msix_fast_mask);
+
 /**
  * pci_msi_supported - check whether MSI may be enabled on a device
  * @dev: pointer to the pci_dev data structure of MSI device function
--- a/drivers/pci/msi/msi.h
+++ b/drivers/pci/msi/msi.h
@@ -40,10 +40,14 @@ static inline void pci_msix_write_vector_ctrl(struct msi_desc *desc, u32 ctrl)
                writel(ctrl, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
 }

+DECLARE_PER_CPU(bool, pci_msix_fast_mask);
 static inline void pci_msix_mask(struct msi_desc *desc)
 {
        desc->pci.msix_ctrl |= PCI_MSIX_ENTRY_CTRL_MASKBIT;
        pci_msix_write_vector_ctrl(desc, desc->pci.msix_ctrl);
+       /* There are only a few cases when we really need the read back. */
+       if (__this_cpu_read(pci_msix_fast_mask))
+               return;
        /* Flush write to device */
        readl(desc->pci.mask_base);
 }

--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -420,13 +420,20 @@ static inline void mask_ack_irq(struct irq_desc *desc)
        }
 }

+/* defined in driver/pci/msi/msi.c */
+DECLARE_PER_CPU(bool, pci_msix_fast_mask);
+
 void mask_irq(struct irq_desc *desc)
 {
        if (irqd_irq_masked(&desc->irq_data))
                return;

        if (desc->irq_data.chip->irq_mask) {
+                if (IS_ENABLED(CONFIG_PCI_MSI))
+                       __this_cpu_write(pci_msix_fast_mask, irqd_has_set(&desc->irq_data, IRQD_IRQ_MODERATED));
                desc->irq_data.chip->irq_mask(&desc->irq_data);
+               if (IS_ENABLED(CONFIG_PCI_MSI))
+                       __this_cpu_write(pci_msix_fast_mask, false);
                irq_state_set_masked(desc);
        }
 }



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ