[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ccca97fb-0dab-0a8f-d2ac-247ba160288d@suse.de>
Date: Wed, 16 Feb 2022 12:32:37 +0100
From: Hannes Reinecke <hare@...e.de>
To: Markus Blöchl <Markus.Bloechl@...tronik.com>,
Christoph Hellwig <hch@....de>
Cc: Keith Busch <kbusch@...nel.org>, Jens Axboe <axboe@...nel.dk>,
Sagi Grimberg <sagi@...mberg.me>,
linux-nvme@...ts.infradead.org, linux-block@...r.kernel.org,
linux-kernel@...r.kernel.org, Stefan Roese <sr@...x.de>
Subject: Re: [RFC PATCH] nvme: prevent hang on surprise removal of NVMe disk
On 2/16/22 12:18, Markus Blöchl wrote:
> On Tue, Feb 15, 2022 at 08:17:31PM +0100, Christoph Hellwig wrote:
>> On Mon, Feb 14, 2022 at 10:51:07AM +0100, Markus Blöchl wrote:
>>> After the surprise removal of a mounted NVMe disk the pciehp task
>>> reliably hangs forever with a trace similar to this one:
>>
>> Do you have a specific reproducer? At least with doing a
>>
>> echo 1 > /sys/.../remove
>>
>> while running fsx on a file system I can't actually reproduce it.
>
> We built our own enclosures with a custom connector to plug the disks.
>
> So an external enclosure for thunderbolt is probably very similar.
> (or just ripping an unscrewed NVMe out of the M.2 ...)
>
> But as already suggested, qemu might also be very useful here as it also
> allows us to test multiple namespaces and multipath I/O, if you/someone
> wants to check those too (hotplug with multipath I/O really scares me).
>
Nothing to be scared of.
I've tested this extensively in the run up to commit 5396fdac56d8
("nvme: fix refcounting imbalance when all paths are down") which,
incidentally, is something you need if you want to test things.
Let me see if I can dig up the testbed.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@...e.de +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer
Powered by blists - more mailing lists