lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAAd53p75d=ibfFRCLmYOMvfrn7XbDajby1shKdWQWW=DOrX3uw@mail.gmail.com>
Date:   Fri, 23 Jul 2021 15:05:12 +0800
From:   Kai-Heng Feng <kai.heng.feng@...onical.com>
To:     Christoph Hellwig <hch@...radead.org>
Cc:     Bjorn Helgaas <helgaas@...nel.org>, Joerg Roedel <jroedel@...e.de>,
        "open list:PCI ENHANCED ERROR HANDLING (EEH) FOR POWERPC" 
        <linuxppc-dev@...ts.ozlabs.org>,
        "open list:PCI SUBSYSTEM" <linux-pci@...r.kernel.org>,
        open list <linux-kernel@...r.kernel.org>,
        Lalithambika Krishnakumar <lalithambika.krishnakumar@...el.com>,
        Alex Williamson <alex.williamson@...hat.com>,
        "Oliver O'Halloran" <oohall@...il.com>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        Mika Westerberg <mika.westerberg@...ux.intel.com>,
        Lu Baolu <baolu.lu@...ux.intel.com>
Subject: Re: [PATCH 1/2] PCI/AER: Disable AER interrupt during suspend

On Fri, Jul 23, 2021 at 1:24 PM Christoph Hellwig <hch@...radead.org> wrote:
>
> On Thu, Jul 22, 2021 at 05:23:51PM -0500, Bjorn Helgaas wrote:
> > Marking both of these as "not applicable" for now because I don't
> > think we really understand what's going on.
> >
> > Apparently a DMA occurs during suspend or resume and triggers an ACS
> > violation.  I don't think think such a DMA should occur in the first
> > place.
> >
> > Or maybe, since you say the problem happens right after ACS is enabled
> > during resume, we're doing the ACS enable incorrectly?  Although I
> > would think we should not be doing DMA at the same time we're enabling
> > ACS, either.
> >
> > If this really is a system firmware issue, both HP and Dell should
> > have the knowledge and equipment to figure out what's going on.
>
> DMA on resume sounds really odd.  OTOH the below mentioned case of
> a DMA during suspend seems very like in some setup.  NVMe has the
> concept of a host memory buffer (HMB) that allows the PCIe device
> to use arbitrary host memory for internal purposes.  Combine this
> with the "Storage D3" misfeature in modern x86 platforms that force
> a slot into d3cold without consulting the driver first and you'd see
> symptoms like this.  Another case would be the NVMe equivalent of the
> AER which could lead to a completion without host activity.

The issue can also be observed on non-HMB NVMe.

>
> We now have quirks in the ACPI layer and NVMe to fully shut down the
> NVMe controllers on these messed up systems with the "Storage D3"
> misfeature which should avoid such "spurious" DMAs at the cost of
> wearning out the device much faster.

Since the issue is on S3, I think the NVMe always fully shuts down.

Kai-Heng

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ