linux-kernel - Re: Report long suspend times of NVMe devices (mostly firmware/device issues)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180122213024.GR12043@localhost.localdomain>
Date:   Mon, 22 Jan 2018 14:30:25 -0700
From:   Keith Busch <keith.busch@...el.com>
To:     Paul Menzel <pmenzel+linux-nvme@...gen.mpg.de>
Cc:     Jens Axboe <axboe@...com>, Christoph Hellwig <hch@....de>,
        Sagi Grimberg <sagi@...mberg.me>,
        linux-nvme@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: Report long suspend times of NVMe devices (mostly
 firmware/device issues)

On Mon, Jan 22, 2018 at 10:02:12PM +0100, Paul Menzel wrote:
> Dear Linux folks,
> 
> 
> Benchmarking the ACPI S3 suspend and resume times with `sleepgraph.py
> -config config/suspend-callgraph.cfg` [1], shows that the NVMe disk SAMSUNG
> MZVKW512HMJP-00000 in the TUXEDO Book BU1406 takes between 0.3 and 1.4
> seconds, holding up the suspend cycle.
> 
> The time is spent in `nvme_shutdown_ctrl()`.
> 
> ### Linux 4.14.1-041401-generic
> 
> > nvme @ 0000:04:00.0 {nvme} async_device (Total Suspend: 1439.299 ms Total Resume: 19.865 ms)
> 
> ### Linux 4.15-rc9
> 
> > nvme @ 0000:04:00.0 {nvme} async_device (Total Suspend: 362.239 ms Total Resume: 19.897 m
> It’d be useful, if the Linux kernel logged such issues visibly to the user,
> so that the hardware manufacturer can be contacted to fix the device
> (probably the firmware).
> 
> In my opinion anything longer than 200 ms should be reported similar to [2],
> and maybe worded like below.
> 
> > NVMe took more than 200 ms to do suspend routine
> 
> What do you think?

The nvme spec guides toward longer times than that. I don't see the
point of warning users about things operating within spec.