lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Thu, 29 Feb 2024 10:14:14 +0800
From: Yu Kuai <yukuai1@...weicloud.com>
To: Xiao Ni <xni@...hat.com>, Yu Kuai <yukuai1@...weicloud.com>,
 mpatocka@...hat.com, heinzm@...hat.com, blazej.kucman@...ux.intel.com,
 agk@...hat.com, snitzer@...nel.org, dm-devel@...ts.linux.dev,
 song@...nel.org, jbrassow@....redhat.com, neilb@...e.de, shli@...com,
 akpm@...l.org
Cc: linux-kernel@...r.kernel.org, linux-raid@...r.kernel.org,
 yi.zhang@...wei.com, yangerkun@...wei.com, "yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH v5 05/14] md: don't suspend the array for interrupted
 reshape

Hi,

在 2024/02/29 10:10, Xiao Ni 写道:
> 
> 在 2024/2/1 下午5:25, Yu Kuai 写道:
>> From: Yu Kuai <yukuai3@...wei.com>
>>
>> md_start_sync() will suspend the array if there are spares that can be
>> added or removed from conf, however, if reshape is still in progress,
> 
> 
> Hi Kuai
> 
> Why md_start_sync can run when reshape is still in progress? 
> md_check_recovery should return without queue sync_work, right?
> 
>> this won't happen at all or data will be corrupted(remove_and_add_spares
>> won't be called from md_choose_sync_action for reshape), hence there is
>> no need to suspend the array if reshape is not done yet.
>>
>> Meanwhile, there is a potential deadlock for raid456:
>>
>> 1) reshape is interrupted;
>>
>> 2) set one of the disk WantReplacement, and add a new disk to the array,
>>     however, recovery won't start until the reshape is finished;
>>
>> 3) then issue an IO across reshpae position, this IO will wait for
>>     reshape to make progress;
>>
>> 4) continue to reshape, then md_start_sync() found there is a spare disk
>>     that can be added to conf, mddev_suspend() is called;
> 
> 
> I c. The answer for my above question is reshape is interrupted and then 
> it continues the reshape, right?
> 

Yes, reshape is interrupted and sync_thread is unregister, then
sync_thread can be registered again to continue reshape.

Thanks,
Kuai

> 
> Best Regards
> 
> Xiao
> 
>>
>> Step 4 and step 3 is waiting for each other, deadlock triggered. Noted
>> this problem is found by code review, and it's not reporduced yet.
>>
>> Fix this porblem by don't suspend the array for interrupted reshape,
>> this is safe because conf won't be changed until reshape is done.
>>
>> Fixes: bc08041b32ab ("md: suspend array in md_start_sync() if array 
>> need reconfiguration")
>> Signed-off-by: Yu Kuai <yukuai3@...wei.com>
>> ---
>>   drivers/md/md.c | 13 +++++++++----
>>   1 file changed, 9 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index 6c5d0a372927..85fde05c37dd 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -9374,12 +9374,17 @@ static void md_start_sync(struct work_struct *ws)
>>       bool suspend = false;
>>       char *name;
>> -    if (md_spares_need_change(mddev))
>> +    /*
>> +     * If reshape is still in progress, spares won't be added or removed
>> +     * from conf until reshape is done.
>> +     */
>> +    if (mddev->reshape_position == MaxSector &&
>> +        md_spares_need_change(mddev)) {
>>           suspend = true;
>> +        mddev_suspend(mddev, false);
>> +    }
>> -    suspend ? mddev_suspend_and_lock_nointr(mddev) :
>> -          mddev_lock_nointr(mddev);
>> -
>> +    mddev_lock_nointr(mddev);
>>       if (!md_is_rdwr(mddev)) {
>>           /*
>>            * On a read-only array we can:
> 
> .
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ