lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87a6yh2nln.wl-maz@kernel.org>
Date:   Wed, 26 Aug 2020 20:50:28 +0100
From:   Marc Zyngier <maz@...nel.org>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     LKML <linux-kernel@...r.kernel.org>, x86@...nel.org,
        Joerg Roedel <joro@...tes.org>,
        iommu@...ts.linux-foundation.org, linux-hyperv@...r.kernel.org,
        Haiyang Zhang <haiyangz@...rosoft.com>,
        Jon Derrick <jonathan.derrick@...el.com>,
        Lu Baolu <baolu.lu@...ux.intel.com>,
        Wei Liu <wei.liu@...nel.org>,
        "K. Y. Srinivasan" <kys@...rosoft.com>,
        Stephen Hemminger <sthemmin@...rosoft.com>,
        Steve Wahl <steve.wahl@....com>,
        Dimitri Sivanich <sivanich@....com>,
        Russ Anderson <rja@....com>, linux-pci@...r.kernel.org,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        Lorenzo Pieralisi <lorenzo.pieralisi@....com>,
        Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
        xen-devel@...ts.xenproject.org, Juergen Gross <jgross@...e.com>,
        Boris Ostrovsky <boris.ostrovsky@...cle.com>,
        Stefano Stabellini <sstabellini@...nel.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Megha Dey <megha.dey@...el.com>,
        Jason Gunthorpe <jgg@...lanox.com>,
        Dave Jiang <dave.jiang@...el.com>,
        Alex Williamson <alex.williamson@...hat.com>,
        Jacob Pan <jacob.jun.pan@...el.com>,
        Baolu Lu <baolu.lu@...el.com>,
        Kevin Tian <kevin.tian@...el.com>,
        Dan Williams <dan.j.williams@...el.com>
Subject: Re: [patch V2 04/46] genirq/chip: Use the first chip in irq_chip_compose_msi_msg()

On Wed, 26 Aug 2020 12:16:32 +0100,
Thomas Gleixner <tglx@...utronix.de> wrote:
> 
> The documentation of irq_chip_compose_msi_msg() claims that with
> hierarchical irq domains the first chip in the hierarchy which has an
> irq_compose_msi_msg() callback is chosen. But the code just keeps
> iterating after it finds a chip with a compose callback.
> 
> The x86 HPET MSI implementation relies on that behaviour, but that does not
> make it more correct.
> 
> The message should always be composed at the domain which manages the
> underlying resource (e.g. APIC or remap table) because that domain knows
> about the required layout of the message.
> 
> On X86 the following hierarchies exist:
> 
> 1)   vector -------- PCI/MSI
> 2)   vector -- IR -- PCI/MSI
> 
> The vector domain has a different message format than the IR (remapping)
> domain. So obviously the PCI/MSI domain can't compose the message without
> having knowledge about the parent domain, which is exactly the opposite of
> what hierarchical domains want to achieve.
> 
> X86 actually has two different PCI/MSI chips where #1 has a compose
> callback and #2 does not. #2 delegates the composition to the remap domain
> where it belongs, but #1 does it at the PCI/MSI level.
> 
> For the upcoming device MSI support it's necessary to change this and just
> let the first domain which can compose the message take care of it. That
> way the top level chip does not have to worry about it and the device MSI
> code does not need special knowledge about topologies. It just sets the
> compose callback to NULL and lets the hierarchy pick the first chip which
> has one.
> 
> Due to that the attempt to move the compose callback from the direct
> delivery PCI/MSI domain to the vector domain made the system fail to boot
> with interrupt remapping enabled because in the remapping case
> irq_chip_compose_msi_msg() keeps iterating and choses the compose callback
> of the vector domain which obviously creates the wrong format for the remap
> table.
> 
> Break out of the loop when the first irq chip with a compose callback is
> found and fixup the HPET code temporarily. That workaround will be removed
> once the direct delivery compose callback is moved to the place where it
> belongs in the vector domain.
> 
> Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
> ---
> V2: New patch. Note, that this might break other stuff which relies on the
>     current behaviour, but the hierarchy composition of DT based chips is
>     really hard to follow.

Grepping around, I don't think there is any occurrence of two irqchips
providing irq_compose_msi() that can share a hierarchy on any real
system, so we should be fine. Famous last words.

> ---
>  arch/x86/kernel/apic/msi.c |    7 +++++--
>  kernel/irq/chip.c          |   12 +++++++++---
>  2 files changed, 14 insertions(+), 5 deletions(-)
> 
> --- a/arch/x86/kernel/apic/msi.c
> +++ b/arch/x86/kernel/apic/msi.c
> @@ -479,10 +479,13 @@ struct irq_domain *hpet_create_irq_domai
>  	info.type = X86_IRQ_ALLOC_TYPE_HPET;
>  	info.hpet_id = hpet_id;
>  	parent = irq_remapping_get_ir_irq_domain(&info);
> -	if (parent == NULL)
> +	if (parent == NULL) {
>  		parent = x86_vector_domain;
> -	else
> +	} else {
>  		hpet_msi_controller.name = "IR-HPET-MSI";
> +		/* Temporary fix: Will go away */
> +		hpet_msi_controller.irq_compose_msi_msg = NULL;
> +	}
>  
>  	fn = irq_domain_alloc_named_id_fwnode(hpet_msi_controller.name,
>  					      hpet_id);
> --- a/kernel/irq/chip.c
> +++ b/kernel/irq/chip.c
> @@ -1544,10 +1544,16 @@ int irq_chip_compose_msi_msg(struct irq_
>  	struct irq_data *pos = NULL;
>  
>  #ifdef	CONFIG_IRQ_DOMAIN_HIERARCHY
> -	for (; data; data = data->parent_data)
> -#endif
> -		if (data->chip && data->chip->irq_compose_msi_msg)
> +	for (; data; data = data->parent_data) {
> +		if (data->chip && data->chip->irq_compose_msi_msg) {
>  			pos = data;
> +			break;
> +		}
> +	}
> +#else
> +	if (data->chip && data->chip->irq_compose_msi_msg)
> +		pos = data;
> +#endif
>  	if (!pos)
>  		return -ENOSYS;
>  
> 
> 

Is it just me, or is this last change more complex than it ought to be?

diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 857f5f4c8098..25e18b73699c 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -1544,7 +1544,7 @@ int irq_chip_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
 	struct irq_data *pos = NULL;
 
 #ifdef	CONFIG_IRQ_DOMAIN_HIERARCHY
-	for (; data; data = data->parent_data)
+	for (; data && !pos; data = data->parent_data)
 #endif
 		if (data->chip && data->chip->irq_compose_msi_msg)
 			pos = data;

Though the for loop in a #ifdef in admittedly an acquired taste...

Reviewed-by: Marc Zyngier <maz@...nel.org>

	M.

-- 
Without deviation from the norm, progress is not possible.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ