lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <DEEGFVBPI57E.1QW7C1D4B2ER4@linux.ibm.com>
Date: Fri, 21 Nov 2025 15:49:48 +0100
From: "Tobias Schumacher" <ts@...ux.ibm.com>
To: "Gerd Bayer" <gbayer@...ux.ibm.com>,
        "Tobias Schumacher"
 <ts@...ux.ibm.com>,
        "Heiko Carstens" <hca@...ux.ibm.com>,
        "Vasily Gorbik"
 <gor@...ux.ibm.com>,
        "Alexander Gordeev" <agordeev@...ux.ibm.com>,
        "Christian Borntraeger" <borntraeger@...ux.ibm.com>,
        "Sven Schnelle"
 <svens@...ux.ibm.com>,
        "Niklas Schnelle" <schnelle@...ux.ibm.com>,
        "Gerald
 Schaefer" <gerald.schaefer@...ux.ibm.com>,
        "Halil Pasic"
 <pasic@...ux.ibm.com>,
        "Matthew Rosato" <mjrosato@...ux.ibm.com>,
        "Thomas
 Gleixner" <tglx@...utronix.de>
Cc: <linux-kernel@...r.kernel.org>, <linux-s390@...r.kernel.org>,
        "Farhan
 Ali" <alifm@...ux.ibm.com>
Subject: Re: [PATCH v5 2/2] s390/pci: Migrate s390 IRQ logic to IRQ domain
 API

On Fri Nov 21, 2025 at 2:27 PM CET, Gerd Bayer wrote:
> Hi Tobias,
>
> sorry for being late with my comments...
>
> On Fri, 2025-11-21 at 06:32 +0100, Tobias Schumacher wrote:
>> s390 is one of the last architectures using the legacy API for setup and
>> teardown of PCI MSI IRQs. Migrate the s390 IRQ allocation and teardown
>> to the MSI parent domain API. For details, see:
>> 
>> https://lore.kernel.org/lkml/20221111120501.026511281@linutronix.de
>> 
>> In detail, create an MSI parent domain for each PCI domain. When a PCI
>> device sets up MSI or MSI-X IRQs, the library creates a per-device IRQ
>> domain for this device, which is used by the device for allocating and
>> freeing IRQs.
>> 
>> The per-device domain delegates this allocation and freeing to the
>> parent-domain. In the end, the corresponding callbacks of the parent
>> domain are responsible for allocating and freeing the IRQs.
>> 
>> The allocation is split into two parts:
>> - zpci_msi_prepare() is called once for each device and allocates the
>>   required resources. On s390, each PCI function has its own airq
>>   vector and a summary bit, which must be configured once per function.
>>   This is done in prepare().
>> - zpci_msi_alloc() can be called multiple times for allocating one or
>>   more MSI/MSI-X IRQs. This creates a mapping between the virtual IRQ
>>   number in the kernel and the hardware IRQ number.
>> 
>> Freeing is split into two counterparts:
>> - zpci_msi_free() reverts the effects of zpci_msi_alloc() and
>> - zpci_msi_teardown() reverts the effects of zpci_msi_prepare(). This is
>>   called once when all IRQs are freed before a device is removed.
>> 
>> Since the parent domain in the end allocates the IRQs, the hwirq
>> encoding must be unambiguous for all IRQs of all devices. This is
>> achieved by 
>> 
>> encoding the hwirq using the PCI function id and the MSI
>> index.
>
> This is no longer true with the per-PCI-domain irq domains! But you
> encode the hwirq with the devfn within the PCI domain, instead.

Correct, will fix that.

>> Reviewed-by: Niklas Schnelle <schnelle@...ux.ibm.com>
>> Reviewed-by: Farhan Ali <alifm@...ux.ibm.com>
>> Signed-off-by: Tobias Schumacher <ts@...ux.ibm.com>
>> ---
>>  arch/s390/Kconfig           |   1 +
>>  arch/s390/include/asm/pci.h |   4 +
>>  arch/s390/pci/pci_bus.c     |  21 ++-
>>  arch/s390/pci/pci_irq.c     | 333 +++++++++++++++++++++++++++-----------------
>>  4 files changed, 227 insertions(+), 132 deletions(-)
>> 
>> diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
>> index 778ce20d34046cad84dd4ef57cab5a662e5796d9..46ab67d607f0db7f5f108106172699a5eebfc8c8 100644
>> --- a/arch/s390/Kconfig
>> +++ b/arch/s390/Kconfig
>> @@ -251,6 +251,7 @@ config S390
>>  	select HOTPLUG_SMT
>>  	select IOMMU_HELPER		if PCI
>>  	select IOMMU_SUPPORT		if PCI
>> +	select IRQ_MSI_LIB if PCI
>
> Nit: There's precedence for both versions (above and below!) but I
> personally would prefer to indent the pre-condition "if PCI" so it
> stands out. Maybe that's a generic clean-up task for this arch-specific
> file...

Since at least the 'if PCI' preconditions are indented in this file,
I'll also do it for this line. 

>> @@ -189,7 +190,7 @@ static bool zpci_bus_is_multifunction_root(struct zpci_dev *zdev)
>>  static int zpci_bus_create_pci_bus(struct zpci_bus *zbus, struct zpci_dev *fr, struct pci_ops *ops)
>>  {
>>  	struct pci_bus *bus;
>> -	int domain;
>> +	int domain, rc;
>>  
>>  	domain = zpci_alloc_domain((u16)fr->uid);
>>  	if (domain < 0)
>> @@ -199,19 +200,28 @@ static int zpci_bus_create_pci_bus(struct zpci_bus *zbus, struct zpci_dev *fr, s
>>  	zbus->multifunction = zpci_bus_is_multifunction_root(fr);
>>  	zbus->max_bus_speed = fr->max_bus_speed;
>>  
>> +	rc = zpci_create_parent_msi_domain(zbus);
>> +	if (rc)
>> +		goto out_free_domain;
>> +
>
> If you shortened this to use the call to
> zpci_create_parent_msi_domain() as predicate for "if" you could do
> without the additional rc.

Will do.

>>  	/*
>>  	 * Note that the zbus->resources are taken over and zbus->resources
>>  	 * is empty after a successful call
>>  	 */
>>  	bus = pci_create_root_bus(NULL, ZPCI_BUS_NR, ops, zbus, &zbus->resources);
>> -	if (!bus) {
>> -		zpci_free_domain(zbus->domain_nr);
>> -		return -EFAULT;
>> -	}
>> +	if (!bus)
>> +		goto out_remove_msi_domain;
>
> Or do you want to set rc to -EFAULT here, and return the "original" rc
> in the error exits?

As Heiko mentioned, -EFAULT shouldn't be returned anyways I'd change it
to -ENOMEM for all error cases.

--- snip ---

>> diff --git a/arch/s390/pci/pci_irq.c b/arch/s390/pci/pci_irq.c
>> index e73be96ce5fe6473fc193d65b8f0ff635d6a98ba..b0be21ab56995e81f54339fc77167f5930182542 100644
>> --- a/arch/s390/pci/pci_irq.c
>> +++ b/arch/s390/pci/pci_irq.c
>> @@ -7,6 +7,7 @@
>>  #include <linux/kernel_stat.h>
>>  #include <linux/pci.h>
>>  #include <linux/msi.h>
>> +#include <linux/irqchip/irq-msi-lib.h>
>>  #include <linux/smp.h>
>>  
>>  #include <asm/isc.h>
>> @@ -110,43 +111,41 @@ static int zpci_set_irq(struct zpci_dev *zdev)
>>  	return rc;
>>  }
>>  
>> -/* Clear adapter interruptions */
>> -static int zpci_clear_irq(struct zpci_dev *zdev)
>
>
> Any specific reason, why you removed zpci_clear_irq() indirecting to
> airq vs. directed_irq - but kept zpci_set_irq()? Just saying this is
> imbalanced now.

I removed it because it was only required in zpci_msi_teardown(), which
already has the distinction between directed and floating IRQs. So
having a separate zpci_clear_irq() seemed redundant. 

--- snip ---

>> +static inline u16 zpci_decode_hwirq_msi_index(irq_hw_number_t irq)
>> +{
>> +	return irq & GENMASK_U16(15, 0);
>
>
> Please don't use GENMASK_U16. It is harder to read than any explicit
> constant like 0x00FF, especially since its parameters contradict the
> architecture's endianess.
> But then, is this called anywhere?

Right, it's not used anymore, I'll remove it completely.

--- snip ---

>> +static int zpci_msi_prepare(struct irq_domain *domain,
>> +			    struct device *dev, int nvec,
>> +			    msi_alloc_info_t *info)
>> +{
>> +	struct zpci_dev *zdev = to_zpci_dev(dev);
>> +	struct pci_dev *pdev = to_pci_dev(dev);
>> +	unsigned long bit;
>> +	int msi_vecs, rc;
>>  
>>  	msi_vecs = min_t(unsigned int, nvec, zdev->max_msi);
>> -	if (msi_vecs < nvec) {
>> -		pr_info("%s requested %d irqs, allocate system limit of %d",
>> +	if (msi_vecs < nvec)
>> +		pr_info("%s requested %d IRQs, allocate system limit of %d\n",
>>  			pci_name(pdev), nvec, zdev->max_msi);
>> -	}
>>  
>>  	rc = __alloc_airq(zdev, msi_vecs, &bit);
>> -	if (rc < 0)
>> +	if (rc) {
>> +		pr_err("Allocating adapter IRQs for %s failed\n", pci_name(pdev));
>>  		return rc;
>> +	}
>>  
>> -	/*
>> -	 * Request MSI interrupts:
>> -	 * When using MSI, nvec_used interrupt sources and their irq
>> -	 * descriptors are controlled through one msi descriptor.
>> -	 * Thus the outer loop over msi descriptors shall run only once,
>> -	 * while two inner loops iterate over the interrupt vectors.
>> -	 * When using MSI-X, each interrupt vector/irq descriptor
>> -	 * is bound to exactly one msi descriptor (nvec_used is one).
>> -	 * So the inner loops are executed once, while the outer iterates
>> -	 * over the MSI-X descriptors.
>> -	 */
>> -	hwirq = bit;
>> -	msi_for_each_desc(msi, &pdev->dev, MSI_DESC_NOTASSOCIATED) {
>> -		if (hwirq - bit >= msi_vecs)
>> -			break;
>> -		irqs_per_msi = min_t(unsigned int, msi_vecs, msi->nvec_used);
>> -		irq = __irq_alloc_descs(-1, 0, irqs_per_msi, 0, THIS_MODULE,
>> -					(irq_delivery == DIRECTED) ?
>> -					msi->affinity : NULL);
>> -		if (irq < 0)
>> -			return -ENOMEM;
>> -
>> -		for (i = 0; i < irqs_per_msi; i++) {
>> -			rc = irq_set_msi_desc_off(irq, i, msi);
>> -			if (rc)
>> -				return rc;
>> -			irq_set_chip_and_handler(irq + i, &zpci_irq_chip,
>> -						 handle_percpu_irq);
>> +	zdev->msi_first_bit = bit;
>> +	zdev->msi_nr_irqs = msi_vecs;
>> +	rc = zpci_set_irq(zdev);
>> +	if (rc) {
>> +		pr_err("Registering adapter IRQs for %s failed\n",
>> +		       pci_name(pdev));
>> +		if (irq_delivery == DIRECTED) {
>> +			airq_iv_free(zpci_ibv[0], zdev->msi_first_bit, msi_vecs);
>> +		} else {
>> +			zpci_clear_airq(zdev);
>> +			airq_iv_release(zdev->aibv);
>> +			zdev->aibv = NULL;
>> +			airq_iv_free_bit(zpci_sbv, zdev->aisb);
>> +			zdev->aisb = -1UL;
>
> These two failure clean-ups look a lot like
> zpci_msi_teardown_directed/_floating() below. Could these be called
> instead of duplicating the code?

Yes they are similar, only that zpci_msi_teardown_directed/_floating()
also call zpci_clear_directed_irq()/zpci_clear_airq(), respectively.
Considering your other comment above, it might be cleaner to add
zpci_clear_irq() again, call this directly from zpci_msi_teardown() and
then use zpci_msi_teardown_directed/_floating() here as suggested.

Thanks,
Tobias

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ