lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220313195220.GA436941@bhelgaas>
Date:   Sun, 13 Mar 2022 14:52:20 -0500
From:   Bjorn Helgaas <helgaas@...nel.org>
To:     Kuppuswamy Sathyanarayanan 
        <sathyanarayanan.kuppuswamy@...ux.intel.com>
Cc:     Bjorn Helgaas <bhelgaas@...gle.com>,
        Russell Currey <ruscur@...sell.cc>,
        Oliver OHalloran <oohall@...il.com>,
        Kuppuswamy Sathyanarayanan <knsathya@...nel.org>,
        Ashok Raj <ashok.raj@...el.com>, linux-pci@...r.kernel.org,
        linux-kernel@...r.kernel.org,
        Eric Badger <ebadger@...estorage.com>,
        linuxppc-dev@...ts.ozlabs.org
Subject: Re: [PATCH v1] PCI/AER: Handle Multi UnCorrectable/Correctable
 errors properly

On Fri, Mar 11, 2022 at 02:58:07AM +0000, Kuppuswamy Sathyanarayanan wrote:
> Currently the aer_irq() handler returns IRQ_NONE for cases without bits
> PCI_ERR_ROOT_UNCOR_RCV or PCI_ERR_ROOT_COR_RCV are set. But this
> assumption is incorrect.
> 
> Consider a scenario where aer_irq() is triggered for a correctable
> error, and while we process the error and before we clear the error
> status in "Root Error Status" register, if the same kind of error
> is triggered again, since aer_irq() only clears events it saw, the
> multi-bit error is left in tact. This will cause the interrupt to fire
> again, resulting in entering aer_irq() with just the multi-bit error
> logged in the "Root Error Status" register.
> 
> Repeated AER recovery test has revealed this condition does happen
> and this prevents any new interrupt from being triggered. Allow to
> process interrupt even if only multi-correctable (BIT 1) or
> multi-uncorrectable bit (BIT 3) is set.
> 
> Reported-by: Eric Badger <ebadger@...estorage.com>

Is there a bug report with any concrete details (dmesg, lspci, etc)
that we can include here?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ