[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <40203778-f217-6789-9c83-ebed3720627b@huaweicloud.com>
Date: Fri, 21 Feb 2025 09:27:52 +0800
From: Yu Kuai <yukuai1@...weicloud.com>
To: Guillaume Morin <guillaume@...infr.org>, Yu Kuai <yukuai1@...weicloud.com>
Cc: linux-raid@...r.kernel.org, linux-kernel@...r.kernel.org,
song@...nel.org, "yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [BUG] possible race between md_free_disk and md_notify_reboot
Hi,
在 2025/02/20 21:39, Guillaume Morin 写道:
> On 20 Feb 19:55, Yu Kuai wrote:
>>
>>> I just take a quick look, the problem looks obviously to me, see how
>>> md_seq_show() handle the iteration.
>>>
>>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>>> index 465ca2af1e6e..7c7a58f618c1 100644
>>> --- a/drivers/md/md.c
>>> +++ b/drivers/md/md.c
>>> @@ -9911,8 +9911,11 @@ static int md_notify_reboot(struct notifier_block
>>> *this,
>>> mddev_unlock(mddev);
>>> }
>>> need_delay = 1;
>>> - mddev_put(mddev);
>>> - spin_lock(&all_mddevs_lock);
>>> +
>>> + spin_lock(&all_mddevs_lock)
>>> + if (atomic_dec_and_test(&mddev->active))
>>> + __mddev_put(mddev);
>>> +
>>> }
>>> spin_unlock(&all_mddevs_lock);
>>
>> While cooking the patch, this is not enough, list_for_each_entry_safe()
>> should be replaced with list_for_each_entry() as well.
>>
>> Will send the patch soon, with:
>>
>> Reported-by: Guillaume Morin <guillaume@...infr.org>
>
> Thank you! I just saw the patch and we are going to test it and let you
> know.
>
> The issue with the next pointer seems to be fixed with your change.
> Though I am still unclear how the 2nd potential issue I mentioned -
> where the current item would be freed concurrently by mddev_free() - is
> prevented. I am not finding anything in the code that seems to prevent a
> concurrent call to mddev_free() for the current item in the
> list_for_each_entry() loop (and therefore accessing mddev after the
> kfree()).
>
> I understand that we are getting a reference through the active atomic
> in mddev_get() under the lock in md_notify_reboot() but how is that
> preventing mddev_free() from freeing the mddev as soon as we release the
> all_mddevs_lock in the loop?
>
> I am not not familiar with this code so I am most likely missing
> osmething but if you had the time to explain, that would be very
> helpful.
I'm not quite sure what you're confused. mddev lifetime are both
protected by lock and reference.
In this case:
hold lock
get first mddev
release lock
// handle first mddev
hold lock
release mddev
get next mddev
release lock
-> mddev can be freed now
// handle the next mddev
...
Thanks,
Kuai
>
> TIA
>
> Guillaume.
>
Powered by blists - more mailing lists