linux-kernel - Re: [PATCH v2] PCI/MSI: Don't touch MSI bits when the PCI device is disconnected

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5da8d8aa9f3818af649b1ac547bc4e6062626ddf.camel@gmail.com>
Date:   Mon, 12 Nov 2018 16:49:59 +1100
From:   Oliver O'Halloran <oohall@...il.com>
To:     Alex_Gagniuc@...lteam.com, gregkh@...uxfoundation.org
Cc:     keith.busch@...el.com, helgaas@...nel.org, mr.nuke.me@...il.com,
        linux-pci@...r.kernel.org, Austin.Bolen@...l.com,
        Shyam.Iyer@...l.com, linux-kernel@...r.kernel.org,
        jonathan.derrick@...el.com, lukas@...ner.de, ruscur@...sell.cc,
        sbobroff@...ux.ibm.com, linuxppc-dev@...ts.ozlabs.org
Subject: Re: [PATCH v2] PCI/MSI: Don't touch MSI bits when the PCI device is
 disconnected

On Thu, 2018-11-08 at 23:06 +0000, Alex_Gagniuc@...lteam.com wrote:
> On 11/08/2018 04:51 PM, Greg KH wrote:
> > On Thu, Nov 08, 2018 at 10:49:08PM +0000, Alex_Gagniuc@...lteam.com wrote:
> > > In the case that we're trying to fix, this code executing is a result of
> > > the device being gone, so we can guarantee race-free operation. I agree
> > > that there is a race, in the general case. As far as checking the result
> > > for all F's, that's not an option when firmware crashes the system as a
> > > result of the mmio read/write. It's never pretty when firmware gets
> > > involved.
> > 
> > If you have firmware that crashes the system when you try to read from a
> > PCI device that was hot-removed, that is broken firmware and needs to be
> > fixed.  The kernel can not work around that as again, you will never win
> > that race.
> 
> But it's not the firmware that crashes. It's linux as a result of a 
> fatal error message from the firmware. And we can't fix that because FFS 
> handling requires that the system reboots [1].

Do we know the exact circumsances that result in firmware requesting a
reboot? If it happen on any PCIe error I don't see what we can do to
prevent that beyond masking UEs entirely (are we even allowed to do
that on FFS systems?).

> If we're going to say that we don't want to support FFS because it's a 
> separate code path, and different flow, that's fine. I am myself, not a 
> fan of FFS. But if we're going to continue supporting it, I think we'll 
> continue to have to resolve these sort of unintended consequences.
> 
> Alex
> 
> [1] ACPI 6.2, 18.1 - Hardware Errors and Error Sources