[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <95cbd47d-46ed-850e-7d4f-851b35d03069@dustymabe.com>
Date: Mon, 12 Sep 2022 22:36:08 -0400
From: Dusty Mabe <dusty@...tymabe.com>
To: Ming Lei <ming.lei@...hat.com>, Christoph Hellwig <hch@....de>
Cc: Jens Axboe <axboe@...nel.dk>, linux-block@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-raid@...r.kernel.org
Subject: Re: regression caused by block: freeze the queue earlier in
del_gendisk
On 9/12/22 21:55, Ming Lei wrote:
> On Mon, Sep 12, 2022 at 09:16:18AM +0200, Christoph Hellwig wrote:
>> On Fri, Sep 09, 2022 at 04:24:40PM +0800, Ming Lei wrote:
>>> On Wed, Sep 07, 2022 at 09:33:24AM +0200, Christoph Hellwig wrote:
>>>> On Thu, Sep 01, 2022 at 03:06:08PM +0800, Ming Lei wrote:
>>>>> It is a bit hard to associate the above commit with reported issue.
>>>>
>>>> So the messages clearly are about something trying to open a device
>>>> that went away at the block layer, but somehow does not get removed
>>>> in time by udev (which seems to be a userspace bug in CoreOS). But
>>>> even with that we really should not hang.
>>>
>>> Xiao Ni provides one script[1] which can reproduce the issue more or less.
>>
>> I've run the reproduced 10000 times on current mainline, and while
>> it prints one of the autoloading messages per run, I've not actually
>> seen any kind of hang.
>
> I can't reproduce the hang too.
I obviously can reproduce the issue with the test in our Fedora CoreOS
test suite. It's part of a framework (i.e. it's not simple some script
you can run) but it is very reproducible so one can add some instrumentation
to the kernel and feed it through a build/test cycle to see different
results or logs.
I'm willing to share this with other people (maybe a screen share or
some written down instructions) if anyone would be interested.
>
> What I meant is that new raid disk can be added by mdadm after stopping
> the imsm container and raid disk with the autoloading messages printed,
> I understand this behavior isn't correct, but I am not familiar with
> raid enough.
>
> It might be related with the delay deleting gendisk from wq & md kobj
> release handler.
>
> During reboot, if mdadm does this stupid thing without stopping, the hang
> could be caused.
>
> I think the root cause is that why mdadm tries to open/add new raid bdev
> crazily during reboot.
>
Dusty
Powered by blists - more mailing lists