linux-kernel - Re: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5d4b3a716f85017c17c52a85915fba9e19509e81.camel@kernel.crashing.org>
Date:   Thu, 16 Jul 2020 08:49:21 +1000
From:   Benjamin Herrenschmidt <benh@...nel.crashing.org>
To:     Bjorn Helgaas <helgaas@...nel.org>,
        David Laight <David.Laight@...LAB.COM>
Cc:     "'Oliver O'Halloran'" <oohall@...il.com>,
        Arnd Bergmann <arnd@...db.de>, Keith Busch <kbusch@...nel.org>,
        Paul Mackerras <paulus@...ba.org>,
        sparclinux <sparclinux@...r.kernel.org>,
        Toan Le <toan@...amperecomputing.com>,
        Greg Ungerer <gerg@...ux-m68k.org>,
        Marek Vasut <marek.vasut+renesas@...il.com>,
        Rob Herring <robh@...nel.org>,
        Lorenzo Pieralisi <lorenzo.pieralisi@....com>,
        Sagi Grimberg <sagi@...mberg.me>,
        Russell King <linux@...linux.org.uk>,
        Ley Foon Tan <ley.foon.tan@...el.com>,
        Christoph Hellwig <hch@....de>,
        Geert Uytterhoeven <geert@...ux-m68k.org>,
        Kevin Hilman <khilman@...libre.com>,
        linux-pci <linux-pci@...r.kernel.org>,
        Jakub Kicinski <kuba@...nel.org>,
        Matt Turner <mattst88@...il.com>,
        "linux-kernel-mentees@...ts.linuxfoundation.org" 
        <linux-kernel-mentees@...ts.linuxfoundation.org>,
        Guenter Roeck <linux@...ck-us.net>,
        Ray Jui <rjui@...adcom.com>, Jens Axboe <axboe@...com>,
        Ivan Kokshaysky <ink@...assic.park.msu.ru>,
        Shuah Khan <skhan@...uxfoundation.org>,
        "bjorn@...gaas.com" <bjorn@...gaas.com>,
        Boris Ostrovsky <boris.ostrovsky@...cle.com>,
        Richard Henderson <rth@...ddle.net>,
        Juergen Gross <jgross@...e.com>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        Thomas Bogendoerfer <tsbogend@...ha.franken.de>,
        Scott Branden <sbranden@...adcom.com>,
        Jingoo Han <jingoohan1@...il.com>,
        "Saheed O. Bolarinwa" <refactormyself@...il.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Philipp Zabel <p.zabel@...gutronix.de>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Gustavo Pimentel <gustavo.pimentel@...opsys.com>,
        linuxppc-dev <linuxppc-dev@...ts.ozlabs.org>,
        "David S. Miller" <davem@...emloft.net>,
        Heiner Kallweit <hkallweit1@...il.com>
Subject: Re: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86

On Wed, 2020-07-15 at 17:12 -0500, Bjorn Helgaas wrote:
> > I've 'played' with PCIe error handling - without much success.
> > What might be useful is for a driver that has just read ~0u to
> > be able to ask 'has there been an error signalled for this device?'.
> 
> In many cases a driver will know that ~0 is not a valid value for the
> register it's reading.  But if ~0 *could* be valid, an interface like
> you suggest could be useful.  I don't think we have anything like that
> today, but maybe we could.  It would certainly be nice if the PCI core
> noticed, logged, and cleared errors.  We have some of that for AER,
> but that's an optional feature, and support for the error bits in the
> garden-variety PCI_STATUS register is pretty haphazard.  As you note
> below, this sort of SERR/PERR reporting is frequently hard-wired in
> ways that takes it out of our purview.

We do have pci_channel_state (via pci_channel_offline()) which covers
the cases where the underlying error handling (such as EEH or unplug)
results in the device being offlined though this tend to be
asynchronous so it might take a few ~0's before you get it.

It's typically used to break potentially infinite loops in some
drivers.

There is no interface to check whether *an* error happened though for
the most cases it will be captured in the status register, which is
harvested (and cleared ?) by some EDAC drivers iirc... 

All this lacks coordination, I agree.

Cheers,
Ben.