linux-kernel - Re: [RFC PATCH] nvme: prevent hang on surprise removal of NVMe disk

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ccca97fb-0dab-0a8f-d2ac-247ba160288d@suse.de>
Date:   Wed, 16 Feb 2022 12:32:37 +0100
From:   Hannes Reinecke <hare@...e.de>
To:     Markus Blöchl <Markus.Bloechl@...tronik.com>,
        Christoph Hellwig <hch@....de>
Cc:     Keith Busch <kbusch@...nel.org>, Jens Axboe <axboe@...nel.dk>,
        Sagi Grimberg <sagi@...mberg.me>,
        linux-nvme@...ts.infradead.org, linux-block@...r.kernel.org,
        linux-kernel@...r.kernel.org, Stefan Roese <sr@...x.de>
Subject: Re: [RFC PATCH] nvme: prevent hang on surprise removal of NVMe disk

On 2/16/22 12:18, Markus Blöchl wrote:
> On Tue, Feb 15, 2022 at 08:17:31PM +0100, Christoph Hellwig wrote:
>> On Mon, Feb 14, 2022 at 10:51:07AM +0100, Markus Blöchl wrote:
>>> After the surprise removal of a mounted NVMe disk the pciehp task
>>> reliably hangs forever with a trace similar to this one:
>>
>> Do you have a specific reproducer? At least with doing a
>>
>> echo 1 > /sys/.../remove
>>
>> while running fsx on a file system I can't actually reproduce it.
> 
> We built our own enclosures with a custom connector to plug the disks.
> 
> So an external enclosure for thunderbolt is probably very similar.
> (or just ripping an unscrewed NVMe out of the M.2 ...)
> 
> But as already suggested, qemu might also be very useful here as it also
> allows us to test multiple namespaces and multipath I/O, if you/someone
> wants to check those too (hotplug with multipath I/O really scares me).
> 
Nothing to be scared of.
I've tested this extensively in the run up to commit 5396fdac56d8 
("nvme: fix refcounting imbalance when all paths are down") which,
incidentally, is something you need if you want to test things.

Let me see if I can dig up the testbed.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@...e.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer