lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5a7574c0efc1475a89f84c6393e598d6@AcuMS.aculab.com>
Date:   Thu, 16 Jul 2020 08:07:37 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'Benjamin Herrenschmidt' <benh@...nel.crashing.org>,
        Bjorn Helgaas <helgaas@...nel.org>
CC:     'Oliver O'Halloran' <oohall@...il.com>,
        Arnd Bergmann <arnd@...db.de>, Keith Busch <kbusch@...nel.org>,
        Paul Mackerras <paulus@...ba.org>,
        sparclinux <sparclinux@...r.kernel.org>,
        Toan Le <toan@...amperecomputing.com>,
        Greg Ungerer <gerg@...ux-m68k.org>,
        "Marek Vasut" <marek.vasut+renesas@...il.com>,
        Rob Herring <robh@...nel.org>,
        "Lorenzo Pieralisi" <lorenzo.pieralisi@....com>,
        Sagi Grimberg <sagi@...mberg.me>,
        Russell King <linux@...linux.org.uk>,
        Ley Foon Tan <ley.foon.tan@...el.com>,
        Christoph Hellwig <hch@....de>,
        Geert Uytterhoeven <geert@...ux-m68k.org>,
        Kevin Hilman <khilman@...libre.com>,
        linux-pci <linux-pci@...r.kernel.org>,
        Jakub Kicinski <kuba@...nel.org>,
        Matt Turner <mattst88@...il.com>,
        "linux-kernel-mentees@...ts.linuxfoundation.org" 
        <linux-kernel-mentees@...ts.linuxfoundation.org>,
        Guenter Roeck <linux@...ck-us.net>,
        Ray Jui <rjui@...adcom.com>, Jens Axboe <axboe@...com>,
        Ivan Kokshaysky <ink@...assic.park.msu.ru>,
        Shuah Khan <skhan@...uxfoundation.org>,
        "bjorn@...gaas.com" <bjorn@...gaas.com>,
        "Boris Ostrovsky" <boris.ostrovsky@...cle.com>,
        Richard Henderson <rth@...ddle.net>,
        Juergen Gross <jgross@...e.com>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        "Thomas Bogendoerfer" <tsbogend@...ha.franken.de>,
        Scott Branden <sbranden@...adcom.com>,
        Jingoo Han <jingoohan1@...il.com>,
        "Saheed O. Bolarinwa" <refactormyself@...il.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Philipp Zabel <p.zabel@...gutronix.de>,
        "Greg Kroah-Hartman" <gregkh@...uxfoundation.org>,
        Gustavo Pimentel <gustavo.pimentel@...opsys.com>,
        linuxppc-dev <linuxppc-dev@...ts.ozlabs.org>,
        "David S. Miller" <davem@...emloft.net>,
        Heiner Kallweit <hkallweit1@...il.com>
Subject: RE: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86

From: Benjamin Herrenschmidt
> Sent: 15 July 2020 23:49
> On Wed, 2020-07-15 at 17:12 -0500, Bjorn Helgaas wrote:
> > > I've 'played' with PCIe error handling - without much success.
> > > What might be useful is for a driver that has just read ~0u to
> > > be able to ask 'has there been an error signalled for this device?'.
> >
> > In many cases a driver will know that ~0 is not a valid value for the
> > register it's reading.  But if ~0 *could* be valid, an interface like
> > you suggest could be useful.  I don't think we have anything like that
> > today, but maybe we could.  It would certainly be nice if the PCI core
> > noticed, logged, and cleared errors.  We have some of that for AER,
> > but that's an optional feature, and support for the error bits in the
> > garden-variety PCI_STATUS register is pretty haphazard.  As you note
> > below, this sort of SERR/PERR reporting is frequently hard-wired in
> > ways that takes it out of our purview.
> 
> We do have pci_channel_state (via pci_channel_offline()) which covers
> the cases where the underlying error handling (such as EEH or unplug)
> results in the device being offlined though this tend to be
> asynchronous so it might take a few ~0's before you get it.

On one of my systems I don't think the error TLP from the target
made its way past the first bridge - I could see the error in it's
status registers.
But I couldn't find any of the AER status registers in the root bridge.
So I think you'd need a software poll of the bridge registers to
find out (and clear) the error.

The NMI on the dell system (which is supposed to meet some special
NEBS? server requirements) is just stupid.
Too late to be synchronous and impossible for the OS to handle.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ