[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3b7742c4-bbae-4a78-a5a6-30df936a17d4@arm.com>
Date: Wed, 13 Dec 2023 11:54:05 +0000
From: Robin Murphy <robin.murphy@....com>
To: Lukas Wunner <lukas@...ner.de>,
Ethan Zhao <haifeng.zhao@...ux.intel.com>
Cc: bhelgaas@...gle.com, baolu.lu@...ux.intel.com, dwmw2@...radead.org,
will@...nel.org, linux-pci@...r.kernel.org, iommu@...ts.linux.dev,
linux-kernel@...r.kernel.org, Haorong Ye <yehaorong@...edance.com>
Subject: Re: [PATCH 2/2] iommu/vt-d: don's issue devTLB flush request when
device is disconnected
On 13/12/2023 10:44 am, Lukas Wunner wrote:
> On Tue, Dec 12, 2023 at 10:46:37PM -0500, Ethan Zhao wrote:
>> For those endpoint devices connect to system via hotplug capable ports,
>> users could request a warm reset to the device by flapping device's link
>> through setting the slot's link control register,
>
> Well, users could just *unplug* the device, right? Why is it relevant
> that thay could fiddle with registers in config space?
>
>
>> as pciehpt_ist() DLLSC
>> interrupt sequence response, pciehp will unload the device driver and
>> then power it off. thus cause an IOMMU devTLB flush request for device to
>> be sent and a long time completion/timeout waiting in interrupt context.
>
> A completion timeout should be on the order of usecs or msecs, why does it
> cause a hard lockup? The dmesg excerpt you've provided shows a 12 *second*
> delay between hot removal and watchdog reaction.
The PCIe spec only requires an endpoint to respond to an ATS invalidate
within a rather hilarious 90 seconds, so it's primarily a question of
how patient the root complex and bridges in between are prepared to be.
>> Fix it by checking the device's error_state in
>> devtlb_invalidation_with_pasid() to avoid sending meaningless devTLB flush
>> request to link down device that is set to pci_channel_io_perm_failure and
>> then powered off in
>
> This doesn't seem to be a proper fix. It will work most of the time
> but not always. A user might bring down the slot via sysfs, then yank
> the card from the slot just when the iommu flush occurs such that the
> pci_dev_is_disconnected(pdev) check returns false but the card is
> physically gone immediately afterwards. In other words, you've shrunk
> the time window during which the issue may occur, but haven't eliminated
> it completely.
Yeah, I think we have a subtle but fundamental issue here in that the
iommu_release_device() callback is hooked to BUS_NOTIFY_REMOVED_DEVICE,
so in general probably shouldn't be assuming it's safe to do anything
with the device itself *after* it's already been removed from its bus -
this step is primarily about cleaning up any of the IOMMU's own state
relating to the given device.
I think if we want to ensure ATCs are invalidated on hot-unplug we need
an additional pre-removal notifier to take care of that, and that step
would then want to distinguish between an orderly removal where cleaning
up is somewhat meaningful, and a surprise removal where it definitely isn't.
Thanks,
Robin.
Powered by blists - more mailing lists