[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <51F5B545.5050300@jp.fujitsu.com>
Date: Mon, 29 Jul 2013 09:20:21 +0900
From: Takao Indoh <indou.takao@...fujitsu.com>
To: vgoyal@...hat.com
CC: bhelgaas@...gle.com, linux-kernel@...r.kernel.org,
linux-pci@...r.kernel.org, iommu@...ts.linux-foundation.org,
kexec@...ts.infradead.org, ishii.hironobu@...fujitsu.com,
ddutile@...hat.com, bill.sumner@...com, alex.williamson@...hat.com,
hbabu@...ibm.com
Subject: Re: [PATCH v2] PCI: Reset PCIe devices to stop ongoing DMA
(2013/07/25 23:24), Vivek Goyal wrote:
> On Wed, Jul 24, 2013 at 03:29:58PM +0900, Takao Indoh wrote:
>> Sorry for letting this discussion slide, I was busy on other works:-(
>> Anyway, the summary of previous discussion is:
>> - My patch adds new initcall(fs_initcall) to reset all PCIe endpoints on
>> boot. This expects PCI enumeration is done before IOMMU
>> initialization as follows.
>> (1) PCI enumeration
>> (2) fs_initcall ---> device reset
>> (3) IOMMU initialization
>> - This works on x86, but does not work on other architecture because
>> IOMMU is initialized before PCI enumeration on some architectures. So,
>> device reset should be done where IOMMU is initialized instead of
>> initcall.
>> - Or, as another idea, we can reset devices in first kernel(panic kernel)
>>
>> Resetting devices in panic kernel is against kdump policy and seems not to
>> be good idea. So I think adding reset code into iommu initialization is
>> better. I'll post patches for that.
>
> I don't understand all the details but I agree that idea of trying to
> reset IOMMU in crashed kernel might not fly.
>
>>
>> Another discussion point is how to handle buggy devices. Resetting buggy
>> devices makes system more unstable. One of ideas is using boot parameter
>> so that user can choose to reset devices or not.
>
> So who would decide which device is buggy and don't reset it. Give
> some details here.
I found the case that kdump does not work after resetting devices and
it works when removing reset patch. The cause of problem is a bug of
PCIe switch chip. If there is boot parameter not to reset devices,
user can use it as workaround.
I think in this case we should add PCI quirk to avoid this buggy
hardware, but we need to wait errata from vendor and it basically takes
long time.
>
> Can't we simply blacklist associated module, so that it never loads
> and then it never tries to reset the devices?
>
So you mean that device reset should be done on its driver loading?
Thanks,
Takao Indoh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists