[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CALTww2_jcDJf-55AEvK2fzf2PLnnOfBw5dG4bQG65B9eFw8Xmg@mail.gmail.com>
Date: Mon, 4 Aug 2025 17:15:12 +0800
From: Xiao Ni <xni@...hat.com>
To: Yu Kuai <yukuai1@...weicloud.com>
Cc: hch@....de, corbet@....net, song@...nel.org, yukuai3@...wei.com,
agk@...hat.com, snitzer@...nel.org, mpatocka@...hat.com, linan122@...wei.com,
hare@...e.de, linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-raid@...r.kernel.org, dm-devel@...ts.linux.dev, yi.zhang@...wei.com,
yangerkun@...wei.com, johnny.chenyi@...wei.com
Subject: Re: [PATCH v5 08/11] md/md-bitmap: add a new method blocks_synced()
in bitmap_operations
Hi Kuai
I found one thing. The interface ->blocks_synced doesn't work in the
write io path. So there is a risk of data corruption. For example:
mdadm -CR /dev/md0 -l5 -n5 /dev/loop[0-4] --bitmap=lockless (all bits
are unwritten because lazy initial recovery)
1. D1 D2 D3 D4 P1, a small write hits D2. rmw is 2 (need to read old
data of D2 and P1), rcw is 3 (need to read D1 D3 and D4).
2. D2 disk fails
3. read data from disk2. It needs to calculate the data from other
disks. But the result is not the real data which was written to D2
So ->blocks_synced needs to be checked in handle_stripe_dirtying.
Best Regards
Xiao
On Fri, Aug 1, 2025 at 3:11 PM Yu Kuai <yukuai1@...weicloud.com> wrote:
>
> From: Yu Kuai <yukuai3@...wei.com>
>
> Currently, raid456 must perform a whole array initial recovery to build
> initail xor data, then IO to the array won't have to read all the blocks
> in underlying disks.
>
> This behavior will affect IO performance a lot, and nowadays there are
> huge disks and the initial recovery can take a long time. Hence llbitmap
> will support lazy initial recovery in following patches. This method is
> used to check if data blocks is synced or not, if not then IO will still
> have to read all blocks for raid456.
>
> Signed-off-by: Yu Kuai <yukuai3@...wei.com>
> Reviewed-by: Christoph Hellwig <hch@....de>
> Reviewed-by: Hannes Reinecke <hare@...e.de>
> Reviewed-by: Xiao Ni <xni@...hat.com>
> Reviewed-by: Li Nan <linan122@...wei.com>
> ---
> drivers/md/md-bitmap.h | 1 +
> drivers/md/raid5.c | 6 ++++++
> 2 files changed, 7 insertions(+)
>
> diff --git a/drivers/md/md-bitmap.h b/drivers/md/md-bitmap.h
> index 95453696c68e..5f41724cbcd8 100644
> --- a/drivers/md/md-bitmap.h
> +++ b/drivers/md/md-bitmap.h
> @@ -90,6 +90,7 @@ struct bitmap_operations {
> md_bitmap_fn *end_discard;
>
> sector_t (*skip_sync_blocks)(struct mddev *mddev, sector_t offset);
> + bool (*blocks_synced)(struct mddev *mddev, sector_t offset);
> bool (*start_sync)(struct mddev *mddev, sector_t offset,
> sector_t *blocks, bool degraded);
> void (*end_sync)(struct mddev *mddev, sector_t offset, sector_t *blocks);
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 5285e72341a2..2121f0ff5e30 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -3748,6 +3748,7 @@ static int want_replace(struct stripe_head *sh, int disk_idx)
> static int need_this_block(struct stripe_head *sh, struct stripe_head_state *s,
> int disk_idx, int disks)
> {
> + struct mddev *mddev = sh->raid_conf->mddev;
> struct r5dev *dev = &sh->dev[disk_idx];
> struct r5dev *fdev[2] = { &sh->dev[s->failed_num[0]],
> &sh->dev[s->failed_num[1]] };
> @@ -3762,6 +3763,11 @@ static int need_this_block(struct stripe_head *sh, struct stripe_head_state *s,
> */
> return 0;
>
> + /* The initial recover is not done, must read everything */
> + if (mddev->bitmap_ops && mddev->bitmap_ops->blocks_synced &&
> + !mddev->bitmap_ops->blocks_synced(mddev, sh->sector))
> + return 1;
> +
> if (dev->toread ||
> (dev->towrite && !test_bit(R5_OVERWRITE, &dev->flags)))
> /* We need this block to directly satisfy a request */
> --
> 2.39.2
>
Powered by blists - more mailing lists