linux-kernel - Re: [PATCH] btrfs:remove redundant index_rbio_pages in raid56_rmw

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <8f1adbef-7146-c8f9-92a3-c18e10e9a66b@gmx.com>
Date:   Thu, 29 Sep 2022 15:00:03 +0800
From:   Qu Wenruo <quwenruo.btrfs@....com>
To:     hmsjwzb <hmsjwzb@...o.com>, Qu Wenruo <wqu@...e.com>
Cc:     Chris Mason <clm@...com>, Josef Bacik <josef@...icpanda.com>,
        David Sterba <dsterba@...e.com>, linux-btrfs@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] btrfs:remove redundant index_rbio_pages in
 raid56_rmw_stripe



On 2022/9/29 14:41, hmsjwzb wrote:
> Hi Qu,
>
> 	Thanks for your reply.
> 	Here is my plan about destructive RMW.

The overall idea is fine, but some details need to be addressed.

>
> 	1. Read all the stripes in through raid56_rmw_stripe.

Note that, even we reached raid56_rmw_stripe(), we may still be a full
stripe write (due to substripe writes merged into one full stripe).

In that case, we should not waste IO to read any stripe.

That's what your patch lacks.
(Also all the comments should be updated since you changed the behavior)

> 	2. Do xor operation in finish_rmw.

You may want to do the verification/recovery all before finish_rmw().

The function finish_rmw() is currently the last step of RMW.

> 	3. If the xor result matches, nothing happened.
> 	4. If the xor result mismatches, we can recover the data or trigger some user space progress to fix the data corruption.

For now just focus on the recover part, no need to bother user space.

>
> 	But here are some problems.
> 	1. If the stripe is new allocated, the check will fail.

For now we should focus on data recovery only (metadata will be much
more complex), and for data we will need to grab checksum for the full
stripe anyway.

So in newly allocated case, there will be no checksum, thus no need to
really do any recovery, we just trust all data which doesn't has no
checksum.

> 	2. Is it convient for kernel get checksum in finish_rmw and recover data?

If not convenient, you'll need to be creative to grab the checksum.

But from what I see, if you pre-fetch the checksum for the full stripe
at raid56_parity_write() timing, it should be fine.

Thanks,
Qu

>
> Thanks,
> Flint
>
> On 9/28/22 23:13, Qu Wenruo wrote:
>>
>>
>> On 2022/9/29 11:08, Qu Wenruo wrote:
>>>
>>>
>>> On 2022/9/29 09:44, Flint.Wang wrote:
>>>>     The index_rbio_pages in raid56_rmw_stripe is redundant.
>>>
>>> index_rbio_pages() is to populate the rbio->bio_sectors array.
>>>
>>> In raid56_rmw_stripe() we later calls sector_in_rbio(), which will check if a sector is belonging to bio_lists.
>>>
>>> If not called, all sector will be returned using the sectors in rbio->bio_sectors, not using the sectors in bio lists.
>>>
>>> Have you tried your patch with fstests runs?
>>
>> Well, for raid56_rmw_stripe() it's fine, as without the index_rbio_pages() call, we just read all the sectors from the disk.
>>
>> This would include the new pages from bio lists.
>>
>> It would only cause extra IO, but since they can all be merged into one 64K stripe, it should not cause performance problem.
>>
>> Furthermore it would read all old sectors from disk, allowing us later to do the verification before doing the writes.
>>
>> But it should really contain a more detailed explanation.
>>
>> Thanks,
>> Qu
>>>
>>> IMHO it should fail a lot of very basic writes in RAID56.
>>>
>>> Thanks,
>>> Qu
>>>
>>>>     It is invoked in finish_rmw anyway.
>>>>
>>>> Signed-off-by: Flint.Wang <hmsjwzb@...o.com>
>>>> ---
>>>>    fs/btrfs/raid56.c | 2 --
>>>>    1 file changed, 2 deletions(-)
>>>>
>>>> diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
>>>> index f6395e8288d69..44266b2c5b86e 100644
>>>> --- a/fs/btrfs/raid56.c
>>>> +++ b/fs/btrfs/raid56.c
>>>> @@ -1546,8 +1546,6 @@ static int raid56_rmw_stripe(struct btrfs_raid_bio *rbio)
>>>>        if (ret)
>>>>            goto cleanup;
>>>> -    index_rbio_pages(rbio);
>>>> -
>>>>        atomic_set(&rbio->error, 0);
>>>>        /* Build a list of bios to read all the missing data sectors. */
>>>>        for (total_sector_nr = 0; total_sector_nr < nr_data_sectors;