linux-kernel - Re: [PATCH 2/2] iommu/vt-d: don's issue devTLB flush request when device is disconnected

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3b7742c4-bbae-4a78-a5a6-30df936a17d4@arm.com>
Date:   Wed, 13 Dec 2023 11:54:05 +0000
From:   Robin Murphy <robin.murphy@....com>
To:     Lukas Wunner <lukas@...ner.de>,
        Ethan Zhao <haifeng.zhao@...ux.intel.com>
Cc:     bhelgaas@...gle.com, baolu.lu@...ux.intel.com, dwmw2@...radead.org,
        will@...nel.org, linux-pci@...r.kernel.org, iommu@...ts.linux.dev,
        linux-kernel@...r.kernel.org, Haorong Ye <yehaorong@...edance.com>
Subject: Re: [PATCH 2/2] iommu/vt-d: don's issue devTLB flush request when
 device is disconnected

On 13/12/2023 10:44 am, Lukas Wunner wrote:
> On Tue, Dec 12, 2023 at 10:46:37PM -0500, Ethan Zhao wrote:
>> For those endpoint devices connect to system via hotplug capable ports,
>> users could request a warm reset to the device by flapping device's link
>> through setting the slot's link control register,
> 
> Well, users could just *unplug* the device, right?  Why is it relevant
> that thay could fiddle with registers in config space?
> 
> 
>> as pciehpt_ist() DLLSC
>> interrupt sequence response, pciehp will unload the device driver and
>> then power it off. thus cause an IOMMU devTLB flush request for device to
>> be sent and a long time completion/timeout waiting in interrupt context.
> 
> A completion timeout should be on the order of usecs or msecs, why does it
> cause a hard lockup?  The dmesg excerpt you've provided shows a 12 *second*
> delay between hot removal and watchdog reaction.

The PCIe spec only requires an endpoint to respond to an ATS invalidate 
within a rather hilarious 90 seconds, so it's primarily a question of 
how patient the root complex and bridges in between are prepared to be.

>> Fix it by checking the device's error_state in
>> devtlb_invalidation_with_pasid() to avoid sending meaningless devTLB flush
>> request to link down device that is set to pci_channel_io_perm_failure and
>> then powered off in
> 
> This doesn't seem to be a proper fix.  It will work most of the time
> but not always.  A user might bring down the slot via sysfs, then yank
> the card from the slot just when the iommu flush occurs such that the
> pci_dev_is_disconnected(pdev) check returns false but the card is
> physically gone immediately afterwards.  In other words, you've shrunk
> the time window during which the issue may occur, but haven't eliminated
> it completely.

Yeah, I think we have a subtle but fundamental issue here in that the 
iommu_release_device() callback is hooked to BUS_NOTIFY_REMOVED_DEVICE, 
so in general probably shouldn't be assuming it's safe to do anything 
with the device itself *after* it's already been removed from its bus - 
this step is primarily about cleaning up any of the IOMMU's own state 
relating to the given device.

I think if we want to ensure ATCs are invalidated on hot-unplug we need 
an additional pre-removal notifier to take care of that, and that step 
would then want to distinguish between an orderly removal where cleaning 
up is somewhat meaningful, and a surprise removal where it definitely isn't.

Thanks,
Robin.