[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <83039906-77f7-4318-94bf-4c98bb3f0e32@linux.intel.com>
Date: Wed, 26 Feb 2025 13:55:28 +0800
From: Ethan Zhao <haifeng.zhao@...ux.intel.com>
To: Baolu Lu <baolu.lu@...ux.intel.com>, Jason Gunthorpe <jgg@...pe.ca>
Cc: Yunhui Cui <cuiyunhui@...edance.com>, dwmw2@...radead.org,
joro@...tes.org, will@...nel.org, robin.murphy@....com,
iommu@...ts.linux.dev, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] iommu/vt-d: fix system hang on reboot -f
在 2025/2/26 13:18, Baolu Lu 写道:
> On 2/26/25 11:50, Ethan Zhao wrote:
>>>>>>
>>> If the schedular doesn't run how did we get from 4 -> 5?
>>>
>>> Maybe the issue is the shutdown handler here is running in the wrong
>>> time and it should not be running after the scheduler has been shut
>>> down.
>>>
>>> I don't think removing the lock is a great idea without more
>>> explanation.
>>
>> Seems it is not so simple job to explain why there is no race window
>> between
>> this iommu_shutdown() and following dmar_global_lock holders.
>>
>> 1. PCIe hotplug dmar_pci_bus_notifier()
>>
>> 2. mm_core_init detect_intel_iommu()
>>
>> 3. late_initcall dmar_free_unused_resources()
>>
>> 4. acpi attach dmar_device_hotplug()
>>
>> 5. pci_iommu_init intel_iommu_init() init_dmars()
>>
>> 6. rootfs_initcall ir_dev_scope_init()
>>
>> though here is the last stage of reboot. then how about we turn back
>> to v1
>>
>> Just repalce with own_write() with down_write_trylock().
>
> I don't think trylock is a reasonable solution. intel_iommu_shutdown()
> should not become a no-op simply because it cannot acquire a lock
> immediately.
No other CPUs is holding lock after they were brought down by sync call to
functionnative_stop_other_cpus(1).
So actually it wouldn't fail to acquire a lock. this is also the reason why we don't
need to down_write() thedmar_global_lock.
>
> The lock here is to protect the drhd (representation of iommu hardware)
> list. It needs protection because this driver supports iommu hot-add and
> remove, which is triggered by an ACPI event for I/O board hotplug.
Yup, the lock is used to protect the global listdmar_drhd_units.
but here all IOAPIC/LAPIC are brought down, hotplug interrupts couldn't
happend either. (only legacy and NMI are alive).
> Provided the system does not respond to those events when this function
> is called, it's fine to remove the lock.
I agree.
>
> Thanks,
> baolu
--
"firm, enduring, strong, and long-lived"
Powered by blists - more mailing lists