[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0525a4bcf17a355cd141632d4f3714be@kernel.org>
Date: Tue, 24 Nov 2020 16:51:59 +0000
From: Marc Zyngier <maz@...nel.org>
To: John Garry <john.garry@...wei.com>
Cc: Thomas Gleixner <tglx@...utronix.de>, gregkh@...uxfoundation.org,
rafael@...nel.org, martin.petersen@...cle.com, jejb@...ux.ibm.com,
linuxarm@...wei.com, linux-scsi@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/3] genirq/affinity: Add irq_update_affinity_desc()
On 2020-11-23 15:45, John Garry wrote:
Hi John,
>>> But it looks like there is more to it than that, which I'm worried is
>>> far from non-trivial. For example, just calling irq_dispose_mapping()
>>> for removal and then plaform_get_irq()->acpi_get_irq() second time
>>> fails as it looks like more tidy-up is needed for removal...
>>
>> Most probably. I could imagine things failing if there is any trace
>> of an existing translation in the ITS or in the platform-MSI layer,
>> for example, or if the interrupt is still active...
>
> So this looks to be a problem I have. So if I hack the code to skip
> the check in acpi_get_irq() for the irq already being init'ed, I run
> into a use-after-free in the gic-v3-its driver. I may be skipping
> something with this hack, but I'll ask anyway.
>
> So initially in the msi_prepare method we setup the its dev - this is
> from the mbigen probe. Then when all the irqs are unmapped later for
> end device driver removal, we release this its device in
> its_irq_domain_free(). But I don't see anything to set it up again. Is
> it improper to have released the its device in this scenario?
> Commenting out the release makes things "good" again.
Huh, that's ugly. The issue is that the device that deals with the
interrupts isn't the device that the ITS knows about (there isn't a
1:1 mapping between mbigen and the endpoint).
The mbigen is responsible for the creation of the corresponding
irqdomain, and and crucially for the "prepare" phase, which results
in storing the its_dev pointer in info->scratchpad[0].
As we free all the interrupts associated with the endpoint, we
free the its_dev (nothing else needs it at this point). On the
next allocation, we reuse the damn its_dev pointer, and we're SOL.
This is wrong, because we haven't removed the mbigen, only the
device *connected* to the mbigen. And since the mbigen can be shared
across endpoints, we can't reliably tear it down at all. Boo.
The only thing to do is to convey that by marking the its_dev as
shared so that it isn't deleted when no LPIs are being used. After
all, it isn't like the mbigen is going anywhere.
It is just that passing that information down isn't a simple affair,
as msi_alloc_info_t isn't a generic type... Let me have a think.
M.
--
Jazz is not dead. It just smells funny...
Powered by blists - more mailing lists