[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <50482eb9-389c-0114-ba21-988f1fce493c@kaod.org>
Date: Tue, 16 Nov 2021 11:27:09 +0100
From: Cédric Le Goater <clg@...d.org>
To: Marc Zyngier <maz@...nel.org>
CC: <linux-kernel@...r.kernel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Michael Ellerman <mpe@...erman.id.au>,
PowerPC <linuxppc-dev@...ts.ozlabs.org>,
Greg Kurz <groug@...d.org>
Subject: Re: [PATCH 16/39] irqdomain: Make normal and nomap irqdomains
exclusive
Hello Marc,
>> This patch is breaking the POWER9/POWER10 XIVE driver (these are not
>> old PPC systems :) on machines sharing the same LSI HW IRQ. For instance,
>> a linux KVM guest with a virtio-rng and a virtio-balloon device. In that
>> case, Linux creates two distinct IRQ mappings which can lead to some
>> unexpected behavior.
>
> Either the irq domain translates, or it doesn't. If the driver creates
> a nomap domain, and yet expects some sort of translation to happen,
> then the driver is fundamentally broken. And even without that: how do
> you end-up with a single HW interrupt having two mappings?
>
>> A fix to go forward would be to change the XIVE IRQ domain to use a
>> 'Tree' domain for reverse mapping and not the 'No Map' domain mapping.
>> I will keep you updated for XIVE.
>
> I bet there is a bit more to it. From what you are saying above,
> something rather ungodly is happening in the XIVE code.
It's making progress.
This change in irq_find_mapping() is what 'breaks' XIVE :
+ if (irq_domain_is_nomap(domain)) {
+ if (hwirq < domain->revmap_size) {
+ data = irq_domain_get_irq_data(domain, hwirq);
+ if (data && data->hwirq == hwirq)
+ return hwirq;
+ }
+
+ return 0;
With the introduction of IRQ_DOMAIN_FLAG_NO_MAP, the revmap_tree lookup
is skipped and the previously mapped IRQ is not found. XIVE was relying
on a side effect of irq_domain_set_mapping() which is not true anymore.
I guess the easiest fix for 5.14 and 5.15 (in which was introduced MSI
domains) is to change the XIVE IRQ domain to a domain tree. Since the HW
can handle 1MB interrupts, this looks like a better choice for the driver.
Thanks,
C.
Powered by blists - more mailing lists