[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <DEEGFVBPI57E.1QW7C1D4B2ER4@linux.ibm.com>
Date: Fri, 21 Nov 2025 15:49:48 +0100
From: "Tobias Schumacher" <ts@...ux.ibm.com>
To: "Gerd Bayer" <gbayer@...ux.ibm.com>,
"Tobias Schumacher"
<ts@...ux.ibm.com>,
"Heiko Carstens" <hca@...ux.ibm.com>,
"Vasily Gorbik"
<gor@...ux.ibm.com>,
"Alexander Gordeev" <agordeev@...ux.ibm.com>,
"Christian Borntraeger" <borntraeger@...ux.ibm.com>,
"Sven Schnelle"
<svens@...ux.ibm.com>,
"Niklas Schnelle" <schnelle@...ux.ibm.com>,
"Gerald
Schaefer" <gerald.schaefer@...ux.ibm.com>,
"Halil Pasic"
<pasic@...ux.ibm.com>,
"Matthew Rosato" <mjrosato@...ux.ibm.com>,
"Thomas
Gleixner" <tglx@...utronix.de>
Cc: <linux-kernel@...r.kernel.org>, <linux-s390@...r.kernel.org>,
"Farhan
Ali" <alifm@...ux.ibm.com>
Subject: Re: [PATCH v5 2/2] s390/pci: Migrate s390 IRQ logic to IRQ domain
API
On Fri Nov 21, 2025 at 2:27 PM CET, Gerd Bayer wrote:
> Hi Tobias,
>
> sorry for being late with my comments...
>
> On Fri, 2025-11-21 at 06:32 +0100, Tobias Schumacher wrote:
>> s390 is one of the last architectures using the legacy API for setup and
>> teardown of PCI MSI IRQs. Migrate the s390 IRQ allocation and teardown
>> to the MSI parent domain API. For details, see:
>>
>> https://lore.kernel.org/lkml/20221111120501.026511281@linutronix.de
>>
>> In detail, create an MSI parent domain for each PCI domain. When a PCI
>> device sets up MSI or MSI-X IRQs, the library creates a per-device IRQ
>> domain for this device, which is used by the device for allocating and
>> freeing IRQs.
>>
>> The per-device domain delegates this allocation and freeing to the
>> parent-domain. In the end, the corresponding callbacks of the parent
>> domain are responsible for allocating and freeing the IRQs.
>>
>> The allocation is split into two parts:
>> - zpci_msi_prepare() is called once for each device and allocates the
>> required resources. On s390, each PCI function has its own airq
>> vector and a summary bit, which must be configured once per function.
>> This is done in prepare().
>> - zpci_msi_alloc() can be called multiple times for allocating one or
>> more MSI/MSI-X IRQs. This creates a mapping between the virtual IRQ
>> number in the kernel and the hardware IRQ number.
>>
>> Freeing is split into two counterparts:
>> - zpci_msi_free() reverts the effects of zpci_msi_alloc() and
>> - zpci_msi_teardown() reverts the effects of zpci_msi_prepare(). This is
>> called once when all IRQs are freed before a device is removed.
>>
>> Since the parent domain in the end allocates the IRQs, the hwirq
>> encoding must be unambiguous for all IRQs of all devices. This is
>> achieved by
>>
>> encoding the hwirq using the PCI function id and the MSI
>> index.
>
> This is no longer true with the per-PCI-domain irq domains! But you
> encode the hwirq with the devfn within the PCI domain, instead.
Correct, will fix that.
>> Reviewed-by: Niklas Schnelle <schnelle@...ux.ibm.com>
>> Reviewed-by: Farhan Ali <alifm@...ux.ibm.com>
>> Signed-off-by: Tobias Schumacher <ts@...ux.ibm.com>
>> ---
>> arch/s390/Kconfig | 1 +
>> arch/s390/include/asm/pci.h | 4 +
>> arch/s390/pci/pci_bus.c | 21 ++-
>> arch/s390/pci/pci_irq.c | 333 +++++++++++++++++++++++++++-----------------
>> 4 files changed, 227 insertions(+), 132 deletions(-)
>>
>> diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
>> index 778ce20d34046cad84dd4ef57cab5a662e5796d9..46ab67d607f0db7f5f108106172699a5eebfc8c8 100644
>> --- a/arch/s390/Kconfig
>> +++ b/arch/s390/Kconfig
>> @@ -251,6 +251,7 @@ config S390
>> select HOTPLUG_SMT
>> select IOMMU_HELPER if PCI
>> select IOMMU_SUPPORT if PCI
>> + select IRQ_MSI_LIB if PCI
>
> Nit: There's precedence for both versions (above and below!) but I
> personally would prefer to indent the pre-condition "if PCI" so it
> stands out. Maybe that's a generic clean-up task for this arch-specific
> file...
Since at least the 'if PCI' preconditions are indented in this file,
I'll also do it for this line.
>> @@ -189,7 +190,7 @@ static bool zpci_bus_is_multifunction_root(struct zpci_dev *zdev)
>> static int zpci_bus_create_pci_bus(struct zpci_bus *zbus, struct zpci_dev *fr, struct pci_ops *ops)
>> {
>> struct pci_bus *bus;
>> - int domain;
>> + int domain, rc;
>>
>> domain = zpci_alloc_domain((u16)fr->uid);
>> if (domain < 0)
>> @@ -199,19 +200,28 @@ static int zpci_bus_create_pci_bus(struct zpci_bus *zbus, struct zpci_dev *fr, s
>> zbus->multifunction = zpci_bus_is_multifunction_root(fr);
>> zbus->max_bus_speed = fr->max_bus_speed;
>>
>> + rc = zpci_create_parent_msi_domain(zbus);
>> + if (rc)
>> + goto out_free_domain;
>> +
>
> If you shortened this to use the call to
> zpci_create_parent_msi_domain() as predicate for "if" you could do
> without the additional rc.
Will do.
>> /*
>> * Note that the zbus->resources are taken over and zbus->resources
>> * is empty after a successful call
>> */
>> bus = pci_create_root_bus(NULL, ZPCI_BUS_NR, ops, zbus, &zbus->resources);
>> - if (!bus) {
>> - zpci_free_domain(zbus->domain_nr);
>> - return -EFAULT;
>> - }
>> + if (!bus)
>> + goto out_remove_msi_domain;
>
> Or do you want to set rc to -EFAULT here, and return the "original" rc
> in the error exits?
As Heiko mentioned, -EFAULT shouldn't be returned anyways I'd change it
to -ENOMEM for all error cases.
--- snip ---
>> diff --git a/arch/s390/pci/pci_irq.c b/arch/s390/pci/pci_irq.c
>> index e73be96ce5fe6473fc193d65b8f0ff635d6a98ba..b0be21ab56995e81f54339fc77167f5930182542 100644
>> --- a/arch/s390/pci/pci_irq.c
>> +++ b/arch/s390/pci/pci_irq.c
>> @@ -7,6 +7,7 @@
>> #include <linux/kernel_stat.h>
>> #include <linux/pci.h>
>> #include <linux/msi.h>
>> +#include <linux/irqchip/irq-msi-lib.h>
>> #include <linux/smp.h>
>>
>> #include <asm/isc.h>
>> @@ -110,43 +111,41 @@ static int zpci_set_irq(struct zpci_dev *zdev)
>> return rc;
>> }
>>
>> -/* Clear adapter interruptions */
>> -static int zpci_clear_irq(struct zpci_dev *zdev)
>
>
> Any specific reason, why you removed zpci_clear_irq() indirecting to
> airq vs. directed_irq - but kept zpci_set_irq()? Just saying this is
> imbalanced now.
I removed it because it was only required in zpci_msi_teardown(), which
already has the distinction between directed and floating IRQs. So
having a separate zpci_clear_irq() seemed redundant.
--- snip ---
>> +static inline u16 zpci_decode_hwirq_msi_index(irq_hw_number_t irq)
>> +{
>> + return irq & GENMASK_U16(15, 0);
>
>
> Please don't use GENMASK_U16. It is harder to read than any explicit
> constant like 0x00FF, especially since its parameters contradict the
> architecture's endianess.
> But then, is this called anywhere?
Right, it's not used anymore, I'll remove it completely.
--- snip ---
>> +static int zpci_msi_prepare(struct irq_domain *domain,
>> + struct device *dev, int nvec,
>> + msi_alloc_info_t *info)
>> +{
>> + struct zpci_dev *zdev = to_zpci_dev(dev);
>> + struct pci_dev *pdev = to_pci_dev(dev);
>> + unsigned long bit;
>> + int msi_vecs, rc;
>>
>> msi_vecs = min_t(unsigned int, nvec, zdev->max_msi);
>> - if (msi_vecs < nvec) {
>> - pr_info("%s requested %d irqs, allocate system limit of %d",
>> + if (msi_vecs < nvec)
>> + pr_info("%s requested %d IRQs, allocate system limit of %d\n",
>> pci_name(pdev), nvec, zdev->max_msi);
>> - }
>>
>> rc = __alloc_airq(zdev, msi_vecs, &bit);
>> - if (rc < 0)
>> + if (rc) {
>> + pr_err("Allocating adapter IRQs for %s failed\n", pci_name(pdev));
>> return rc;
>> + }
>>
>> - /*
>> - * Request MSI interrupts:
>> - * When using MSI, nvec_used interrupt sources and their irq
>> - * descriptors are controlled through one msi descriptor.
>> - * Thus the outer loop over msi descriptors shall run only once,
>> - * while two inner loops iterate over the interrupt vectors.
>> - * When using MSI-X, each interrupt vector/irq descriptor
>> - * is bound to exactly one msi descriptor (nvec_used is one).
>> - * So the inner loops are executed once, while the outer iterates
>> - * over the MSI-X descriptors.
>> - */
>> - hwirq = bit;
>> - msi_for_each_desc(msi, &pdev->dev, MSI_DESC_NOTASSOCIATED) {
>> - if (hwirq - bit >= msi_vecs)
>> - break;
>> - irqs_per_msi = min_t(unsigned int, msi_vecs, msi->nvec_used);
>> - irq = __irq_alloc_descs(-1, 0, irqs_per_msi, 0, THIS_MODULE,
>> - (irq_delivery == DIRECTED) ?
>> - msi->affinity : NULL);
>> - if (irq < 0)
>> - return -ENOMEM;
>> -
>> - for (i = 0; i < irqs_per_msi; i++) {
>> - rc = irq_set_msi_desc_off(irq, i, msi);
>> - if (rc)
>> - return rc;
>> - irq_set_chip_and_handler(irq + i, &zpci_irq_chip,
>> - handle_percpu_irq);
>> + zdev->msi_first_bit = bit;
>> + zdev->msi_nr_irqs = msi_vecs;
>> + rc = zpci_set_irq(zdev);
>> + if (rc) {
>> + pr_err("Registering adapter IRQs for %s failed\n",
>> + pci_name(pdev));
>> + if (irq_delivery == DIRECTED) {
>> + airq_iv_free(zpci_ibv[0], zdev->msi_first_bit, msi_vecs);
>> + } else {
>> + zpci_clear_airq(zdev);
>> + airq_iv_release(zdev->aibv);
>> + zdev->aibv = NULL;
>> + airq_iv_free_bit(zpci_sbv, zdev->aisb);
>> + zdev->aisb = -1UL;
>
> These two failure clean-ups look a lot like
> zpci_msi_teardown_directed/_floating() below. Could these be called
> instead of duplicating the code?
Yes they are similar, only that zpci_msi_teardown_directed/_floating()
also call zpci_clear_directed_irq()/zpci_clear_airq(), respectively.
Considering your other comment above, it might be cleaner to add
zpci_clear_irq() again, call this directly from zpci_msi_teardown() and
then use zpci_msi_teardown_directed/_floating() here as suggested.
Thanks,
Tobias
Powered by blists - more mailing lists