linux-kernel - Re: [PATCH] md/bitmap: Fix bitmap page writing problem when using block integrity

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a658717c-8388-6e56-4d8d-096b0a1aefb9@molgen.mpg.de>
Date:   Thu, 20 Jul 2023 08:31:30 +0200
From:   Paul Menzel <pmenzel@...gen.mpg.de>
To:     Jinyoung Choi <j-young.choi@...sung.com>
Cc:     song@...nel.org, shli@...com, neilb@...e.com,
        linux-raid@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] md/bitmap: Fix bitmap page writing problem when using
 block integrity

Dear Jinyoung,


Thank you very much for your patch. Some minor comments, you can also 
ignore.

For the commit message summary/title you might be more specific. Maybe:

> Avoid protection error writing bitmap page with block integrity

Am 20.07.23 um 08:12 schrieb Jinyoung CHOI:
> Be careful when changing the page to perform DMA.
> Changing the bitmap page is also possible on the page where the DMA is
> being performed or scheduled in the MD.

Please add a blank line between paragraphs or do not wrap a line just 
because a sentence ends.

> When configuring raid1(mirror) with devices that support block integrity,

Add a space before the (?

> the same bitmap page is sent to the device twice during the resync process,
> causing the following problems.
> (When requeue is executed, integrity is not updated)
> 
>               [Func 1]                         [Func 2]
> 
> 1     A(page) + a(integrity)
> 2        (sq doorbell)
> 3                                         A(page) -> A-1(page)
> 4  A-1(page-updated) + a(integiry)     A-1(page) + a-1(integrity)

integ*rit*y

> 5      	                                    (sq doorbell)
> 6           (DMA)                               (DMA)
> 
> 	I/O Fail and retry N                 I/O Success
> 	To be Faulty Device
> 
> The following is the log when a problem occurs. The problematic device
> is in the faulty device state.
> 
> Log:
> [  135.037253] md/raid1:md0: active with 2 out of 2 mirrors
> [  135.038228] md0: detected capacity change from 0 to 7501212288
> [  135.038270] md: resync of RAID array md0
> [  151.252172] nvme2n1: I/O Cmd(0x1) @ LBA 16, 8 blocks, I/O Error (sct 0x2 / sc 0x82) MORE
> [  151.252180] protection error, dev nvme2n1, sector 16 op 0x1:(WRITE) flags 0x10800 phys_seg 1 prio class 2
> [  151.252185] md: super_written gets error=-84
> [  151.252187] md/raid1:md0: Disk failure on nvme2n1, disabling device.
>                 md/raid1:md0: Operation continuing on 1 devices.
> [  151.267450] nvme3n1: I/O Cmd(0x1) @ LBA 16, 8 blocks, I/O Error (sct 0x2 / sc 0x82) MORE
> [  151.267457] protection error, dev nvme3n1, sector 16 op 0x1:(WRITE) flags 0x10800 phys_seg 1 prio class 2
> [  151.267460] md: super_written gets error=-84
> [  151.268458] md: md0: resync interrupted.
> [  151.320765] md: resync of RAID array md0
> [  151.321205] md: md0: resync done.

Although you explained the problem well, it’d be great nevertheless if 
you could add the details of your system to the commit message.

> Fixes: 85c9ccd4f026 ("md/bitmap: Don't write bitmap while earlier writes might be in-flight")
> Signed-off-by: Jinyoung Choi <j-young.choi@...sung.com>

Your From line spells it CHOI. Maybe you can update your git 
configuration to also use Choi?

> ---
>   drivers/md/md-bitmap.c | 7 +++++++
>   1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
> index 1ff712889a3b..dfb7418ba48a 100644
> --- a/drivers/md/md-bitmap.c
> +++ b/drivers/md/md-bitmap.c
> @@ -467,6 +467,13 @@ void md_bitmap_update_sb(struct bitmap *bitmap)
>   		return;
>   	if (!bitmap->storage.sb_page) /* no superblock */
>   		return;
> +
> +	/*
> +	 * Before modifying the bitmap page and re-issue it, wait for
> +	 * the requests previously sent to the device to be completed.
> +	 */
> +	md_bitmap_wait_writes(bitmap);
> +
>   	sb = kmap_atomic(bitmap->storage.sb_page);
>   	sb->events = cpu_to_le64(bitmap->mddev->events);
>   	if (bitmap->mddev->events < bitmap->events_cleared)


Kind regards,

Paul