linux-kernel - Re: PROBLEM: Recent raid10 block discard patchset causes filesystem corruption on fstrim

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a85943ed-60d4-05ad-9f6d-d76324fa5538@redhat.com>
Date:   Thu, 24 Dec 2020 18:18:40 +0800
From:   Xiao Ni <xni@...hat.com>
To:     Song Liu <songliubraving@...com>,
        Matthew Ruffell <matthew.ruffell@...onical.com>
Cc:     linux-raid <linux-raid@...r.kernel.org>,
        Song Liu <song@...nel.org>,
        lkml <linux-kernel@...r.kernel.org>, Coly Li <colyli@...e.de>,
        Guoqing Jiang <guoqing.jiang@...ud.ionos.com>,
        "khalid.elmously@...onical.com" <khalid.elmously@...onical.com>,
        Jay Vosburgh <jay.vosburgh@...onical.com>
Subject: Re: PROBLEM: Recent raid10 block discard patchset causes filesystem
 corruption on fstrim

On 12/09/2020 12:17 PM, Song Liu wrote:
> Hi Matthew,
>
>> On Dec 8, 2020, at 7:46 PM, Matthew Ruffell <matthew.ruffell@...onical.com> wrote:
>>
>> Hello,
>>
>> I recently backported the following patches into the Ubuntu stable kernels:
>>
>> md: add md_submit_discard_bio() for submitting discard bio
>> md/raid10: extend r10bio devs to raid disks
>> md/raid10: pull codes that wait for blocked dev into one function
>> md/raid10: improve raid10 discard request
>> md/raid10: improve discard request for far layout
>> dm raid: fix discard limits for raid1 and raid10
>> dm raid: remove unnecessary discard limits for raid10
> Thanks for the report!
>
> Hi Xiao,
>
> Could you please take a look at this and let me know soon? We need to fix
> this before 5.10 official release.
>
> Thanks,
> Song
>
Hi all

The root cause is found. Now we use a similar way with raid0 to handle 
discard request
for raid10. Because the discard region is very big, we can calculate the 
start/end address
for each disk. Then we can submit the discard request to each disk. But 
for raid10, it has
copies. For near layout, if the discard request doesn't align with chunk 
size, we calculate
a start_disk_offset. Now we only use start_disk_offset for the first 
disk, but it should be
used for the near copies disks too.

[  789.709501] discard bio start : 70968, size : 191176
[  789.709507] first stripe index 69, start disk index 0, start disk 
offset 70968
[  789.709509] last stripe index 256, end disk index 0, end disk offset 
262144
[  789.709511] disk 0, dev start : 70968, dev end : 262144
[  789.709515] disk 1, dev start : 70656, dev end : 262144

For example, in this test case, it has 2 near copies. The 
start_disk_offset for the first disk is 70968.
It should use the same offset address for second disk. But it uses the 
start address of this chunk.
It discard more region. The patch in the attachment can fix this 
problem. It split the region that
doesn't align with chunk size.

There is another problem. The stripe size should be calculated 
differently for near layout and far layout.

@Song, do you want me to use a separate patch for this fix, or fix this 
in the original patch?

Merry Christmas
Xiao

View attachment "fix-raid10-discard-patch" of type "text/plain" (2665 bytes)