lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Wed, 18 Sep 2013 06:29:12 -0500
From:	David Woodhouse <dwmw2@...radead.org>
To:	Takao Indoh <indou.takao@...fujitsu.com>
Cc:	linux-kernel@...r.kernel.org, iommu@...ts.linux-foundation.org,
	joro@...tes.org, kexec@...ts.infradead.org,
	alex.williamson@...hat.com
Subject: Re: [PATCH] intel-iommu: Quiesce devices before disabling IOMMU

On Tue, 2013-09-10 at 14:43 +0900, Takao Indoh wrote:
> (2013/09/09 18:07), David Woodhouse wrote:
> > If the driver is so broken that it cannot get the device working again
> > after a fault, surely the driver needs to be fixed?
> 
> Yes,this problem may be solved by fixing driver. Actually megaraid sas
> driver is recently fixed for this problem. (See commit 6431f5d7)
> 
> But I think root cause of this problem is initializing IOMMU while DMA
> is still working, and I want to solve the root cause rather than
> handling it in each driver, otherwise we have to fix driver each time we
> find this kind of problem.

But if the driver is broken and cannot actually recover from hardware
issues, the driver needs to be fixed *anyway*. We shouldn't be papering
over the problem.

> > For the IOMMU code to reset individual devices, just because they still
> > have an active DMA mapping even if they're not *doing* DMA, seems wrong.
>  
> Right, current code is resetting devices which *may* be doing DMA. The
> ideal way is finding devices which are actually doing DMA and reset only
> them but I don't know how we can do this, though I think current code
> is sufficient.

No, that's not the ideal way either. Their DMA will be blocked, and
they'll stop (or at least we'll stop getting an interrupt and reporting
their DMA faults, if the hardware *is* so broken that it keeps trying
over and over again). The new driver will come up and reset the device,
and all will be well.

Do not paper over driver bugs. You are just *encouraging* brokenness.

We need to fix the 'fault storm' issue, by setting the FPD bit in the
context-entry for offending devices when appropriate, and then clearing
it again when appropriate too. But for the IOMMU code to go out and
trigger a PCI reset of random devices and buses is ABSOLUTELY WRONG.

Do Not Do This.

-- 
dwmw2


Download attachment "smime.p7s" of type "application/x-pkcs7-signature" (5745 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ