lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <47d0f003-ebc8-4959-a22c-ccf9ea7ef35a@gmx.com>
Date: Mon, 23 Sep 2024 18:23:58 +0930
From: Qu Wenruo <quwenruo.btrfs@....com>
To: Johannes Thumshirn <Johannes.Thumshirn@....com>,
 Johannes Thumshirn <jth@...nel.org>, Chris Mason <clm@...com>,
 Josef Bacik <josef@...icpanda.com>, David Sterba <dsterba@...e.com>,
 "open list:BTRFS FILE SYSTEM" <linux-btrfs@...r.kernel.org>,
 open list <linux-kernel@...r.kernel.org>
Cc: WenRuo Qu <wqu@...e.com>, Naohiro Aota <Naohiro.Aota@....com>
Subject: Re: [PATCH] btrfs: also add stripe entries for NOCOW writes



在 2024/9/23 17:45, Johannes Thumshirn 写道:
> On 23.09.24 09:56, Qu Wenruo wrote:
>>>> 在 2024/9/23 16:15, Johannes Thumshirn 写道:
>>>>> From: Johannes Thumshirn <johannes.thumshirn@....com>
>>>>>
>>>>> NOCOW writes do not generate stripe_extent entries in the RAID stripe
>>>>> tree, as the RAID stripe-tree feature initially was designed with a
>>>>> zoned filesystem in mind and on a zoned filesystem, we do not allow NOCOW
>>>>> writes. But the RAID stripe-tree feature is independent from the zoned
>>>>> feature, so we must also allow NOCOW writes for zoned filesystems.
>>>>>
>>>>> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@....com>
>>>>
>>>> Sorry I'm going to repeat myself again, I still believe if we insert an
>>>> RST entry at falloc() time, it will be more consistent with the non-RST
>>>> code.
>>>>
>>>> Yes, I known preallocated space will not need any read nor search RST
>>>> entry, and we just fill the page cache with zero at read time.
>>>>
>>>> But the point of proper (not just dummy) RST entry for the whole
>>>> preallocated space is, we do not need to touch the RST entry anymore for
>>>> NOCOW/PREALLOCATED write at all.
>>>>
>>>> This makes the RST NOCOW/PREALLOC writes behavior to align with the
>>>> non-RST code, which doesn't update any extent item, but only modify the
>>>> file extent for PREALLOC writes.
>>>
>>> Please re-read the patch. This is not a dummy RST entry but a real RST
>>> entry for NOCOW writes.
>>>
>> I know, but my point is, if the RST entry for preallocated range is
>> already a regular one, you won't even need to insert/update the RST tree
>> at all.
>>
>> Just like we do not need to update the extent tree for
>> NOCOW/PREALLOCATED writes.
>
> But as long as there is no data on disk there is no point of having a
> logical to N-disk physical mapping. We haven't even called
> btrfs_map_block() before, so where do we know which disks will get the
> blocks at which address? Look at this example:
>
> Fallocate [0, 1M]
> virtme-scsi:/mnt # xfs_io -fc "falloc 0 1M" -c sync test
[...]
>
>
> [1] we preallocate the data for [0, 1M] @ 298844160
> [2] we have the actual written data for [64k, 128k] @ 298844160
>
> What should I do to insert the RST entry there as we get:

Do the needed write map and insert the RST entries, just as if we're
doing a write, but without any actual IO.


Currently the handling of RST is not consistent with non-RST, thus
that's the reason causing problems with scrub on preallocated extents.

I known preallocated range won't trigger any read thus it makes no sense
to do the proper RST setup.
But let's also take the example of non-RST scrub:

Do we spend our time checking if a data extent is preallocated or not?
No, we do not. We just read the content, and check against its csum.
For preallocated extents, it just has no csum.

You can argue that this is wasting IO, but I can also counter-argue that
we need to make sure the read on that device range is fine, even if we
know we will not really read the content (but that's only for now).


Furthermore, from the logical aspect, at least to me, non-RST case is
just a special case where logical address is 1:1 mapped already.

This also means, even for preallocated extents, they should have RST
entries.


Finally, I do not think it's a good idea to insert RST entries for NOCOW.
If a file is set NOCOW, it means we'll doing a lot of overwrite for it.
Then why waste our time updating the RST entries again and again?

Isn't such behavior going to cause more write amplification? Meanwhile
for non-RST cases, NOCOW should cause the least amount of write
amplification.



So again, YES, I know doing proper write map and inserting RST entries
for preallocated range makes no sense for READ.
But preallocation and NOCOW is utilized for contents we frequently
over-writes, and such RST entries save us more for those frequently
over-writes.

Thanks,
Qu
>
> [3] the physical copies starting at 298909696 on devid 1 and 277938176
> on devid 2
>
>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ