[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <65ab0815-daf0-c614-7fa5-cab095d513b1@huaweicloud.com>
Date: Tue, 5 Aug 2025 09:15:43 +0800
From: Yu Kuai <yukuai1@...weicloud.com>
To: Xiao Ni <xni@...hat.com>, Yu Kuai <yukuai1@...weicloud.com>
Cc: hch@....de, corbet@....net, song@...nel.org, agk@...hat.com,
snitzer@...nel.org, mpatocka@...hat.com, linan122@...wei.com, hare@...e.de,
linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-raid@...r.kernel.org, dm-devel@...ts.linux.dev, yi.zhang@...wei.com,
yangerkun@...wei.com, johnny.chenyi@...wei.com,
"yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH v5 08/11] md/md-bitmap: add a new method blocks_synced()
in bitmap_operations
Hi,
在 2025/08/04 17:15, Xiao Ni 写道:
> Hi Kuai
>
> I found one thing. The interface ->blocks_synced doesn't work in the
> write io path. So there is a risk of data corruption. For example:
>
> mdadm -CR /dev/md0 -l5 -n5 /dev/loop[0-4] --bitmap=lockless (all bits
> are unwritten because lazy initial recovery)
> 1. D1 D2 D3 D4 P1, a small write hits D2. rmw is 2 (need to read old
> data of D2 and P1), rcw is 3 (need to read D1 D3 and D4).
> 2. D2 disk fails
> 3. read data from disk2. It needs to calculate the data from other
> disks. But the result is not the real data which was written to D2
>
> So ->blocks_synced needs to be checked in handle_stripe_dirtying.
This is correct, the write path is missing. Will fix it in the next
version.
Thanks,
Kuai
>
> Best Regards
> Xiao
>
> On Fri, Aug 1, 2025 at 3:11 PM Yu Kuai <yukuai1@...weicloud.com> wrote:
>>
>> From: Yu Kuai <yukuai3@...wei.com>
>>
>> Currently, raid456 must perform a whole array initial recovery to build
>> initail xor data, then IO to the array won't have to read all the blocks
>> in underlying disks.
>>
>> This behavior will affect IO performance a lot, and nowadays there are
>> huge disks and the initial recovery can take a long time. Hence llbitmap
>> will support lazy initial recovery in following patches. This method is
>> used to check if data blocks is synced or not, if not then IO will still
>> have to read all blocks for raid456.
>>
>> Signed-off-by: Yu Kuai <yukuai3@...wei.com>
>> Reviewed-by: Christoph Hellwig <hch@....de>
>> Reviewed-by: Hannes Reinecke <hare@...e.de>
>> Reviewed-by: Xiao Ni <xni@...hat.com>
>> Reviewed-by: Li Nan <linan122@...wei.com>
>> ---
>> drivers/md/md-bitmap.h | 1 +
>> drivers/md/raid5.c | 6 ++++++
>> 2 files changed, 7 insertions(+)
>>
>> diff --git a/drivers/md/md-bitmap.h b/drivers/md/md-bitmap.h
>> index 95453696c68e..5f41724cbcd8 100644
>> --- a/drivers/md/md-bitmap.h
>> +++ b/drivers/md/md-bitmap.h
>> @@ -90,6 +90,7 @@ struct bitmap_operations {
>> md_bitmap_fn *end_discard;
>>
>> sector_t (*skip_sync_blocks)(struct mddev *mddev, sector_t offset);
>> + bool (*blocks_synced)(struct mddev *mddev, sector_t offset);
>> bool (*start_sync)(struct mddev *mddev, sector_t offset,
>> sector_t *blocks, bool degraded);
>> void (*end_sync)(struct mddev *mddev, sector_t offset, sector_t *blocks);
>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
>> index 5285e72341a2..2121f0ff5e30 100644
>> --- a/drivers/md/raid5.c
>> +++ b/drivers/md/raid5.c
>> @@ -3748,6 +3748,7 @@ static int want_replace(struct stripe_head *sh, int disk_idx)
>> static int need_this_block(struct stripe_head *sh, struct stripe_head_state *s,
>> int disk_idx, int disks)
>> {
>> + struct mddev *mddev = sh->raid_conf->mddev;
>> struct r5dev *dev = &sh->dev[disk_idx];
>> struct r5dev *fdev[2] = { &sh->dev[s->failed_num[0]],
>> &sh->dev[s->failed_num[1]] };
>> @@ -3762,6 +3763,11 @@ static int need_this_block(struct stripe_head *sh, struct stripe_head_state *s,
>> */
>> return 0;
>>
>> + /* The initial recover is not done, must read everything */
>> + if (mddev->bitmap_ops && mddev->bitmap_ops->blocks_synced &&
>> + !mddev->bitmap_ops->blocks_synced(mddev, sh->sector))
>> + return 1;
>> +
>> if (dev->toread ||
>> (dev->towrite && !test_bit(R5_OVERWRITE, &dev->flags)))
>> /* We need this block to directly satisfy a request */
>> --
>> 2.39.2
>>
>
>
> .
>
Powered by blists - more mailing lists