lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEEQ3w=1H7o7G56hYmwEGErfuFX-2nG=wesJYL7p0dDqLv_rpw@mail.gmail.com>
Date: Fri, 28 Feb 2025 10:18:03 +0800
From: yunhui cui <cuiyunhui@...edance.com>
To: Ethan Zhao <etzhao1900@...il.com>
Cc: Jason Gunthorpe <jgg@...pe.ca>, Ethan Zhao <haifeng.zhao@...ux.intel.com>, 
	Baolu Lu <baolu.lu@...ux.intel.com>, dwmw2@...radead.org, joro@...tes.org, 
	will@...nel.org, robin.murphy@....com, iommu@...ts.linux.dev, 
	linux-kernel@...r.kernel.org
Subject: Re: [External] Re: [PATCH v2] iommu/vt-d: fix system hang on reboot -f

Hi All,

On Fri, Feb 28, 2025 at 8:51 AM Ethan Zhao <etzhao1900@...il.com> wrote:
>
>
> On 2/28/2025 4:38 AM, Jason Gunthorpe wrote:
> > On Thu, Feb 27, 2025 at 08:40:31AM +0800, Ethan Zhao wrote:
> >> On 2/26/2025 9:04 PM, Jason Gunthorpe wrote:
> >>> On Wed, Feb 26, 2025 at 01:55:28PM +0800, Ethan Zhao wrote:
> >>>>> Provided the system does not respond to those events when this function
> >>>>> is called, it's fine to remove the lock.
> >>>> I agree.
> >>> I think it is running the destruction of the iommu far too late in the
> >>> process. IMHO it should be done after all the drivers have been
> >>> shutdown, before the CPUs go single threaded.
> >> Hmm... so far it is fine, the iommu_shutdown only has a little work to
> >> do, disable the translation, the PMR disabling is just backward compatible,
> >> was deprecated already. if we move it to one position where all CPUs are
> >> cycling, we don't know what kind of user-land tasks left there (i.e. reboot -f
> >> case), it would be hard to full-fill the requirement of Intel VT-d, no ongoing
> >> transaction there on hardware when issue the translation disabling command.
> > There is no guarentee device dma is halted anyhow at this point either.
>
> The -f option to reboot command, suggests data corruption possibility.although
>
> the IOMMU strives not to cross the transaction boundaries of its address
> translation.
>
> over all, we should make the reboot -f function works. not to hang
> there. meanwhile
>
> it doesn't break anything else so far.

patch v1:
if (!down_write_trylock(&dmar_global_lock))
return;

 patch v3:
/* All other CPUs were brought down, hotplug interrupts were disabled,
no lock and RCU checking needed anymore*/
list_for_each_entry(drhd, &dmar_drhd_units, list) {
iommu = drhd->iommu;
/* Disable PMRs explicitly here. */
iommu_disable_protect_mem_regions(iommu);
iommu_disable_translation(iommu);
}

Since we can remove down_write/up_write, it indicates that during the
IOMMU shutdown process, we are not particularly concerned about
whether others are accessing the critical section. Therefore, it can
be understood that Patch v1 can successfully acquire the lock and
proceed with subsequent operations. From this perspective, Patch v1 is
reasonable and can prevent situations where there is an actual owner
of dmar_global_lock, even though it does not perform a real IOMMU
shutdown.

However, if the IOMMU shutdown truly fails, IOMMU_WAIT_OP will trigger
a panic(). Removing down_write/up_write offers better maintainability
than Patch v1 (as we can retrieve the cause from the vmcore). But this
might not be significant, since the reboot could have been completed
successfully, and the occurrence of panic() might instead cause
confusion.

In summary, it is essential to complete the reboot action here rather
than causing the system to hang. So, which do you prefer, v1 or v3?


>
> Thanks,
>
> Ethan
>
> >
> > Jason

Thanks,
Yunhui

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ