[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1379503752.753.52.camel@shinybook.infradead.org>
Date: Wed, 18 Sep 2013 06:29:12 -0500
From: David Woodhouse <dwmw2@...radead.org>
To: Takao Indoh <indou.takao@...fujitsu.com>
Cc: linux-kernel@...r.kernel.org, iommu@...ts.linux-foundation.org,
joro@...tes.org, kexec@...ts.infradead.org,
alex.williamson@...hat.com
Subject: Re: [PATCH] intel-iommu: Quiesce devices before disabling IOMMU
On Tue, 2013-09-10 at 14:43 +0900, Takao Indoh wrote:
> (2013/09/09 18:07), David Woodhouse wrote:
> > If the driver is so broken that it cannot get the device working again
> > after a fault, surely the driver needs to be fixed?
>
> Yes,this problem may be solved by fixing driver. Actually megaraid sas
> driver is recently fixed for this problem. (See commit 6431f5d7)
>
> But I think root cause of this problem is initializing IOMMU while DMA
> is still working, and I want to solve the root cause rather than
> handling it in each driver, otherwise we have to fix driver each time we
> find this kind of problem.
But if the driver is broken and cannot actually recover from hardware
issues, the driver needs to be fixed *anyway*. We shouldn't be papering
over the problem.
> > For the IOMMU code to reset individual devices, just because they still
> > have an active DMA mapping even if they're not *doing* DMA, seems wrong.
>
> Right, current code is resetting devices which *may* be doing DMA. The
> ideal way is finding devices which are actually doing DMA and reset only
> them but I don't know how we can do this, though I think current code
> is sufficient.
No, that's not the ideal way either. Their DMA will be blocked, and
they'll stop (or at least we'll stop getting an interrupt and reporting
their DMA faults, if the hardware *is* so broken that it keeps trying
over and over again). The new driver will come up and reset the device,
and all will be well.
Do not paper over driver bugs. You are just *encouraging* brokenness.
We need to fix the 'fault storm' issue, by setting the FPD bit in the
context-entry for offending devices when appropriate, and then clearing
it again when appropriate too. But for the IOMMU code to go out and
trigger a PCI reset of random devices and buses is ABSOLUTELY WRONG.
Do Not Do This.
--
dwmw2
Download attachment "smime.p7s" of type "application/x-pkcs7-signature" (5745 bytes)
Powered by blists - more mailing lists