[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <16bf9d14bc5f4a90b2b88dd2eb165186@ausx13mps321.AMER.DELL.COM>
Date: Thu, 8 Nov 2018 23:06:47 +0000
From: <Alex_Gagniuc@...lteam.com>
To: <gregkh@...uxfoundation.org>
Cc: <keith.busch@...el.com>, <helgaas@...nel.org>,
<mr.nuke.me@...il.com>, <linux-pci@...r.kernel.org>,
<Austin.Bolen@...l.com>, <Shyam.Iyer@...l.com>,
<linux-kernel@...r.kernel.org>, <jonathan.derrick@...el.com>,
<lukas@...ner.de>, <ruscur@...sell.cc>, <sbobroff@...ux.ibm.com>,
<oohall@...il.com>, <linuxppc-dev@...ts.ozlabs.org>
Subject: Re: [PATCH v2] PCI/MSI: Don't touch MSI bits when the PCI device is
disconnected
On 11/08/2018 04:51 PM, Greg KH wrote:
> On Thu, Nov 08, 2018 at 10:49:08PM +0000, Alex_Gagniuc@...lteam.com wrote:
>> In the case that we're trying to fix, this code executing is a result of
>> the device being gone, so we can guarantee race-free operation. I agree
>> that there is a race, in the general case. As far as checking the result
>> for all F's, that's not an option when firmware crashes the system as a
>> result of the mmio read/write. It's never pretty when firmware gets
>> involved.
>
> If you have firmware that crashes the system when you try to read from a
> PCI device that was hot-removed, that is broken firmware and needs to be
> fixed. The kernel can not work around that as again, you will never win
> that race.
But it's not the firmware that crashes. It's linux as a result of a
fatal error message from the firmware. And we can't fix that because FFS
handling requires that the system reboots [1].
If we're going to say that we don't want to support FFS because it's a
separate code path, and different flow, that's fine. I am myself, not a
fan of FFS. But if we're going to continue supporting it, I think we'll
continue to have to resolve these sort of unintended consequences.
Alex
[1] ACPI 6.2, 18.1 - Hardware Errors and Error Sources
Powered by blists - more mailing lists