[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b3c80659-8879-6e7c-e732-5fb690b7bc97@huaweicloud.com>
Date: Wed, 31 Jan 2024 09:35:22 +0800
From: Yu Kuai <yukuai1@...weicloud.com>
To: Yu Kuai <yukuai1@...weicloud.com>, Mikulas Patocka <mpatocka@...hat.com>
Cc: heinzm@...hat.com, xni@...hat.com, agk@...hat.com, snitzer@...nel.org,
dm-devel@...ts.linux.dev, song@...nel.org, jbrassow@....redhat.com,
neilb@...e.de, shli@...com, akpm@...l.org, linux-kernel@...r.kernel.org,
linux-raid@...r.kernel.org, yi.zhang@...wei.com, yangerkun@...wei.com,
"yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH RFC v4 13/14] dm: wait for IO completion before removing
dm device
Hi,
在 2024/01/30 21:05, Yu Kuai 写道:
> Hi,
>
> 在 2024/01/30 19:46, Mikulas Patocka 写道:
>>
>>
>> On Tue, 30 Jan 2024, Yu Kuai wrote:
>>
>>> From: Yu Kuai <yukuai3@...wei.com>
>>>
>>> __dm_destroy() guarantee that device openers is zero, and then
>>> only call 'presuspend' and 'postsuspend' for the target. For
>>> request-based dm, 'md->holders' will be grabbed for each rq and
>>> __dm_destroy() will wait for 'md->holders' to be zero. However, for
>>> bio-based device, __dm_destroy() doesn't wait for all bios to be done.
>>>
>>> Fix this problem by calling dm_wait_for_completion() to wail for all
>>> inflight IO to be done, like what dm_suspend() does.
>>
>> If the number of openers is zero, it is guaranteed that there are no bios
>> in flight. Therefore, we don't have to wait for them.
>>
>> If there are bios in flight, it is a bug in the code that issues the
>> bios.
>> You can put WARN_ON(dm_in_flight_bios(md)) there.
>
> I add this patch because while testing, there is a problem that is
> hard to reporduce, as I mentioned in the other thread. I'll add BUG_ON()
> and try if I can still reporduce this problem without triggering it.
>
> Thanks,
> Kuai
>
> [12504.959682] BUG bio-296 (Not tainted): Object already free
> [12504.960239]
> -----------------------------------------------------------------------------
>
> [12504.960239]
> [12504.961209] Allocated in mempool_alloc+0xe8/0x270 age=30 cpu=1
> pid=203288
> [12504.961905] kmem_cache_alloc+0x36a/0x3b0
> [12504.962324] mempool_alloc+0xe8/0x270
> [12504.962712] bio_alloc_bioset+0x3b5/0x920
> [12504.963129] bio_alloc_clone+0x3e/0x160
> [12504.963533] alloc_io+0x3d/0x1f0
> [12504.963876] dm_submit_bio+0x12f/0xa30
> [12504.964267] __submit_bio+0x9c/0xe0
> [12504.964639] submit_bio_noacct_nocheck+0x25a/0x570
> [12504.965136] submit_bio_wait+0xc2/0x160
> [12504.965535] blkdev_issue_zeroout+0x19b/0x2e0
> [12504.965991] ext4_init_inode_table+0x246/0x560
> [12504.966462] ext4_lazyinit_thread+0x750/0xbe0
> [12504.966922] kthread+0x1b4/0x1f0
After adding the BUG_ON(), I can still reporducing this BUG, this really
looks like a BUG, and I don't think this is related to dm-raid. Perhaps
you guys can take a look?
Thanks,
Kuai
>>
>> Mikulas
>>
>>> Signed-off-by: Yu Kuai <yukuai3@...wei.com>
>>> ---
>>> drivers/md/dm.c | 3 +++
>>> 1 file changed, 3 insertions(+)
>>>
>>> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
>>> index 8dcabf84d866..2c0eae67d0f1 100644
>>> --- a/drivers/md/dm.c
>>> +++ b/drivers/md/dm.c
>>> @@ -58,6 +58,7 @@ static DEFINE_IDR(_minor_idr);
>>> static DEFINE_SPINLOCK(_minor_lock);
>>> static void do_deferred_remove(struct work_struct *w);
>>> +static int dm_wait_for_completion(struct mapped_device *md, unsigned
>>> int task_state);
>>> static DECLARE_WORK(deferred_remove_work, do_deferred_remove);
>>> @@ -2495,6 +2496,8 @@ static void __dm_destroy(struct mapped_device
>>> *md, bool wait)
>>> if (!dm_suspended_md(md)) {
>>> dm_table_presuspend_targets(map);
>>> set_bit(DMF_SUSPENDED, &md->flags);
>>> + if (wait)
>>> + dm_wait_for_completion(md, TASK_UNINTERRUPTIBLE);
>>> set_bit(DMF_POST_SUSPENDING, &md->flags);
>>> dm_table_postsuspend_targets(map);
>>> }
>>> --
>>> 2.39.2
>>>
>>
>> .
>>
>
> .
>
Powered by blists - more mailing lists