lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Tue, 3 Nov 2020 10:23:18 +0100
From:   Christoph Hellwig <hch@....de>
To:     Gloria Tsai <Gloria.Tsai@...tc.com>
Cc:     Christoph Hellwig <hch@....de>, Jongpil Jung <jongpuls@...il.com>,
        Keith Busch <kbusch@...nel.org>, Jens Axboe <axboe@...com>,
        Sagi Grimberg <sagi@...mberg.me>,
        "linux-nvme@...ts.infradead.org" <linux-nvme@...ts.infradead.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "jongpil19.jung@...sung.com" <jongpil19.jung@...sung.com>,
        "jongheony.kim@...sung.com" <jongheony.kim@...sung.com>,
        "dj54.sohn@...sung.com" <dj54.sohn@...sung.com>
Subject: Re: [PATCH V3 1/1] nvme: Add quirk for LiteON CL1 devices running
 FW 220TQ,22001

On Tue, Nov 03, 2020 at 02:21:16AM +0000, Gloria Tsai wrote:
> When host issue shutdown + D3hot in suspend, NVMe drive might have
> chance choosing wrong pointer which has already been used by GC then
> cause over program.
> Do GC before shutdown -> delete IO Q -> shutdown from host -> breakup GC -> D3hot -> enter PS4 -> have a chance swap block -> use wrong pointer on device SRAM -> over program

Aka there is data corruption?

> The issue only happens in simple suspend (shutdown+D3hot) with specific FW on Kahoku board.

Kahoku is a specific LiteOn controller?  Or it is the host system?

Maybe main issue with this patch is that it mixes up two axis:

 - use power states for suspend despite HMB on specific host systems
   identified by the DMI ids.  This kinda makes sense to me, as
   the power state based suspends has lots of advantages, so having
   a whitelist when to use it seem ok, despite the clutter that this
   causes.
 - then tie this to specific NVMe devices that don't work without this
   quirk, which leaves open the issue what we do when we encounter such
   a device in a different host system.  If shutdown + D3hot causes
   problems there is seems like for the case where the above quirk doesn't
   apply we should just skip the shutdown and let the D3hot do a surprise
   power removal?  That's mean recovery when coming back from the suspend,
   but would cause less corruption?

In other words I think this needs to be two patches:

 1) quirk based on the DMI table and allow power state based
    suspend on given host systems even when a HMB is enabled
 2) quirk based on the nvme device (and if possible use the PCI IDs)
    to disable shutdown before suspend (possibly with a warning printk
    when this happens)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ