lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3f16658c-56f9-4ec5-ba6e-4c56a0b8f5e4@gmx.com>
Date: Tue, 24 Sep 2024 10:16:13 +0930
From: Qu Wenruo <quwenruo.btrfs@....com>
To: Josef Bacik <josef@...icpanda.com>
Cc: Johannes Thumshirn <jth@...nel.org>, Chris Mason <clm@...com>,
 David Sterba <dsterba@...e.com>,
 "open list:BTRFS FILE SYSTEM" <linux-btrfs@...r.kernel.org>,
 open list <linux-kernel@...r.kernel.org>, Qu Wenruo <wqu@...e.com>,
 Naohiro Aota <naohiro.aota@....com>,
 Johannes Thumshirn <johannes.thumshirn@....com>
Subject: Re: [PATCH] btrfs: also add stripe entries for NOCOW writes



在 2024/9/24 08:02, Qu Wenruo 写道:
>
>
> 在 2024/9/24 00:50, Josef Bacik 写道:
>> On Mon, Sep 23, 2024 at 04:58:34PM +0930, Qu Wenruo wrote:
>>>
>>>
>>> 在 2024/9/23 16:15, Johannes Thumshirn 写道:
>>>> From: Johannes Thumshirn <johannes.thumshirn@....com>
>>>>
>>>> NOCOW writes do not generate stripe_extent entries in the RAID stripe
>>>> tree, as the RAID stripe-tree feature initially was designed with a
>>>> zoned filesystem in mind and on a zoned filesystem, we do not allow
>>>> NOCOW
>>>> writes. But the RAID stripe-tree feature is independent from the zoned
>>>> feature, so we must also allow NOCOW writes for zoned filesystems.
>>>>
>>>> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@....com>
>>>
>>> Sorry I'm going to repeat myself again, I still believe if we insert an
>>> RST entry at falloc() time, it will be more consistent with the non-RST
>>> code.
>>>
>>> Yes, I known preallocated space will not need any read nor search RST
>>> entry, and we just fill the page cache with zero at read time.
>>>
>>> But the point of proper (not just dummy) RST entry for the whole
>>> preallocated space is, we do not need to touch the RST entry anymore for
>>> NOCOW/PREALLOCATED write at all.
>>>
>>> This makes the RST NOCOW/PREALLOC writes behavior to align with the
>>> non-RST code, which doesn't update any extent item, but only modify the
>>> file extent for PREALLOC writes.
>>
>> I see what you're getting at here, but it creates a huge amount of
>> problems
>> later down the line.
>>
>> I prealloc an extent, I map that logical extent to a physical extent
>> and then I
>> insert a RST entry for that mapping.  Now I rip out one of my disks,
>> and now I
>> have to update RST entries for extents I'm not going to rewrite
>> because they're
>> prealloc.
>
> Why do we even need to do anything update the RST entries?
>
> RST is just an extra layer for logical bytenr mapping, and did you see
> the non-RST btrfs do relocation just because one device went missing?
>
> Can you explain more on the "have to update RST entries" part?
> That mismatches from my understanding of RST.
>
>>
>> RST is a logical->physical mapping.  We do not need to update or
>> insert anything
>> until we create that logical->physical mapping.
>
> Just consider the fallocate of non-RST as an example.
>
> We DO allocate real data extents, they have real location on the disk.
>
> Then add the RST layer. Now preallocated extent suddenly do not have RST
> mapping, but still have extents allocated for them.
>
> I do not think this is any more consistent.
>
>>  Keeping the rules consistent
>> across the different layers will make it easier to reason about and
>> easier to
>> maintain.
>
> I think all data extents should have RST mapping, that's way more
> consistent than two different handling for different data extents.
>
> Just like we do not bother if a data extent is preallocated or not in
> scrub.
>
>
>>  Adding an index at endio time for NOCOW makes sense, we now have
>> created a thing on disk that we need to have a mapping for.  The same
>> goes for
>> prealloc, adding an entry at prealloc time doesn't make logical sense
>> as we
>> haven't yet instantiated that space on disk.  Thanks,
>
> But we allocated data extents. Even if we won't really utilize them for
> now, we should have RST for it.
>
> In fact, I do not even think it's correct to insert/update RST at
> endio/ordered extent time.
>
> It will be way more consistent to update/insert RST entries at data
> extent allocation time.

My bad, this is impossible for zoned devices, and that's why RST is here
to help the situation.

So this indeed means we have to do the work at endio/finish_ordered_io()
time.

And in that case, it looks like we really have to handle preallocated
extents differently (thanks to these zoned devices)

Thanks,
Qu
>
> Thanks,
> Qu
>
>>
>> Josef
>>
>
>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ