[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a658717c-8388-6e56-4d8d-096b0a1aefb9@molgen.mpg.de>
Date: Thu, 20 Jul 2023 08:31:30 +0200
From: Paul Menzel <pmenzel@...gen.mpg.de>
To: Jinyoung Choi <j-young.choi@...sung.com>
Cc: song@...nel.org, shli@...com, neilb@...e.com,
linux-raid@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] md/bitmap: Fix bitmap page writing problem when using
block integrity
Dear Jinyoung,
Thank you very much for your patch. Some minor comments, you can also
ignore.
For the commit message summary/title you might be more specific. Maybe:
> Avoid protection error writing bitmap page with block integrity
Am 20.07.23 um 08:12 schrieb Jinyoung CHOI:
> Be careful when changing the page to perform DMA.
> Changing the bitmap page is also possible on the page where the DMA is
> being performed or scheduled in the MD.
Please add a blank line between paragraphs or do not wrap a line just
because a sentence ends.
> When configuring raid1(mirror) with devices that support block integrity,
Add a space before the (?
> the same bitmap page is sent to the device twice during the resync process,
> causing the following problems.
> (When requeue is executed, integrity is not updated)
>
> [Func 1] [Func 2]
>
> 1 A(page) + a(integrity)
> 2 (sq doorbell)
> 3 A(page) -> A-1(page)
> 4 A-1(page-updated) + a(integiry) A-1(page) + a-1(integrity)
integ*rit*y
> 5 (sq doorbell)
> 6 (DMA) (DMA)
>
> I/O Fail and retry N I/O Success
> To be Faulty Device
>
> The following is the log when a problem occurs. The problematic device
> is in the faulty device state.
>
> Log:
> [ 135.037253] md/raid1:md0: active with 2 out of 2 mirrors
> [ 135.038228] md0: detected capacity change from 0 to 7501212288
> [ 135.038270] md: resync of RAID array md0
> [ 151.252172] nvme2n1: I/O Cmd(0x1) @ LBA 16, 8 blocks, I/O Error (sct 0x2 / sc 0x82) MORE
> [ 151.252180] protection error, dev nvme2n1, sector 16 op 0x1:(WRITE) flags 0x10800 phys_seg 1 prio class 2
> [ 151.252185] md: super_written gets error=-84
> [ 151.252187] md/raid1:md0: Disk failure on nvme2n1, disabling device.
> md/raid1:md0: Operation continuing on 1 devices.
> [ 151.267450] nvme3n1: I/O Cmd(0x1) @ LBA 16, 8 blocks, I/O Error (sct 0x2 / sc 0x82) MORE
> [ 151.267457] protection error, dev nvme3n1, sector 16 op 0x1:(WRITE) flags 0x10800 phys_seg 1 prio class 2
> [ 151.267460] md: super_written gets error=-84
> [ 151.268458] md: md0: resync interrupted.
> [ 151.320765] md: resync of RAID array md0
> [ 151.321205] md: md0: resync done.
Although you explained the problem well, it’d be great nevertheless if
you could add the details of your system to the commit message.
> Fixes: 85c9ccd4f026 ("md/bitmap: Don't write bitmap while earlier writes might be in-flight")
> Signed-off-by: Jinyoung Choi <j-young.choi@...sung.com>
Your From line spells it CHOI. Maybe you can update your git
configuration to also use Choi?
> ---
> drivers/md/md-bitmap.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
> index 1ff712889a3b..dfb7418ba48a 100644
> --- a/drivers/md/md-bitmap.c
> +++ b/drivers/md/md-bitmap.c
> @@ -467,6 +467,13 @@ void md_bitmap_update_sb(struct bitmap *bitmap)
> return;
> if (!bitmap->storage.sb_page) /* no superblock */
> return;
> +
> + /*
> + * Before modifying the bitmap page and re-issue it, wait for
> + * the requests previously sent to the device to be completed.
> + */
> + md_bitmap_wait_writes(bitmap);
> +
> sb = kmap_atomic(bitmap->storage.sb_page);
> sb->events = cpu_to_le64(bitmap->mddev->events);
> if (bitmap->mddev->events < bitmap->events_cleared)
Kind regards,
Paul
Powered by blists - more mailing lists