linux-kernel - Re: [PATCH] nvme-pci: Use non-operational power state instead of D3 on Suspend-to-Idle

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20190510154904.GA31649@lst.de>
Date:   Fri, 10 May 2019 17:49:04 +0200
From:   "hch@....de" <hch@....de>
To:     Kai Heng Feng <kai.heng.feng@...onical.com>
Cc:     Keith Busch <kbusch@...nel.org>,
        "Mario.Limonciello@...l.com" <Mario.Limonciello@...l.com>,
        "hch@....de" <hch@....de>, "axboe@...com" <axboe@...com>,
        "sagi@...mberg.me" <sagi@...mberg.me>,
        "rafael@...nel.org" <rafael@...nel.org>,
        "linux-pm@...r.kernel.org" <linux-pm@...r.kernel.org>,
        "Wysocki, Rafael J" <rafael.j.wysocki@...el.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-nvme@...ts.infradead.org" <linux-nvme@...ts.infradead.org>,
        "Busch, Keith" <keith.busch@...el.com>
Subject: Re: [PATCH] nvme-pci: Use non-operational power state instead of
 D3 on Suspend-to-Idle

On Fri, May 10, 2019 at 11:18:52PM +0800, Kai Heng Feng wrote:
> > I'm afraid the requirement is still not clear to me. AFAIK, all our
> > barriers routines ensure data is visible either between CPUs, or between
> > CPU and devices. The CPU never accesses HMB memory, so there must be some
> > other reasoning if this barrier is a real requirement for this device.
> 
> Sure, I’ll ask vendor what that MemRd is for.

I'd like to understand this bug, but this thread leaves me a little
confused.  So we have a NVMe driver with HMB.

Something crashes - the kernel or the firmware?
When does it crash?  suspend or resume?

That crash seems to be related to a related to a PCIe TLP that reads
memory from the host, probably due to the HMB.

But a device with a HMB has been told that it can access that memory
at any time.  So if in any given suspend state TLP to access RAM are
not allowed we'll have to tell the device to stop using the HMB.

So: what power states do not allow the device to DMA to / from host
memory?  How do we find out we are about to enter those from the
pm methods?  We'll then need to disable the HMB, which might suck
in terms of latency.