[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f6ae39fd-ee30-4e22-8d0d-6dec5c3bd192@gmx.com>
Date: Tue, 24 Sep 2024 07:53:50 +0930
From: Qu Wenruo <quwenruo.btrfs@....com>
To: Johannes Thumshirn <Johannes.Thumshirn@....com>,
Johannes Thumshirn <jth@...nel.org>, Chris Mason <clm@...com>,
Josef Bacik <josef@...icpanda.com>, David Sterba <dsterba@...e.com>,
"open list:BTRFS FILE SYSTEM" <linux-btrfs@...r.kernel.org>,
open list <linux-kernel@...r.kernel.org>
Cc: WenRuo Qu <wqu@...e.com>, Naohiro Aota <Naohiro.Aota@....com>
Subject: Re: [PATCH] btrfs: also add stripe entries for NOCOW writes
在 2024/9/24 00:11, Johannes Thumshirn 写道:
> On 23.09.24 10:54, Qu Wenruo wrote:
>>
>>
[...]
>> Finally, I do not think it's a good idea to insert RST entries for NOCOW.
>> If a file is set NOCOW, it means we'll doing a lot of overwrite for it.
>> Then why waste our time updating the RST entries again and again?
>>
>> Isn't such behavior going to cause more write amplification? Meanwhile
>> for non-RST cases, NOCOW should cause the least amount of write
>> amplification.
>
> The whole idea behind the RST was to write the RST entries _after_ the
> data has been persisted to disk. Otherwise we're back at the write hole
> problem. See for example this imaginary sequence:
>
> Preallocate a range. This will then also preallocate the RST entries
> with the mapping as you describe. Write to it and while you write you
> have a powerloss. The copy/stripe to disk 1 is correctly written but
> disk 2 didn't report back before the power loss happened.
> After we have
> power again, a read to disk 2 comes in, as we have a RST entry, the read
> will be directed to the broken entry and garbage is returned. And this
> is the good case, as we can repair it.
> If it was an overwrite of a block and the same happens, we have a RST
> entry pointing to a good and a bad copy.
Nope, that will not happen.
Because our metadata is still COW protected, after such powerloss, the
file extent is still showing that range is PREALLOCATED, we won't even
trigger a read.
And this is exactly the same as the non-RST PREALLOCATED write.
>
> Once we're adding the RST entries after both writes succeed the problem
> isn't there. So for preallocated extents it is even harmful to add a RST
> entry.
You just forgot the metadata part, which prevents the problem from
happening in the very beginning.
Thanks,
Qu
Powered by blists - more mailing lists