[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <717a2c83-0678-9310-4c75-9ad5da0472f6@samsung.com>
Date: Wed, 18 May 2022 11:15:52 +0200
From: Pankaj Raghav <p.raghav@...sung.com>
To: <dsterba@...e.cz>
CC: <axboe@...nel.dk>, <damien.lemoal@...nsource.wdc.com>,
<pankydev8@...il.com>, <dsterba@...e.com>, <hch@....de>,
<linux-nvme@...ts.infradead.org>, <linux-fsdevel@...r.kernel.org>,
<linux-btrfs@...r.kernel.org>, <jiangbo.365@...edance.com>,
<linux-block@...r.kernel.org>, <gost.dev@...sung.com>,
<linux-kernel@...r.kernel.org>, <dm-devel@...hat.com>
Subject: Re: [PATCH v4 08/13] btrfs:zoned: make sb for npo2 zone devices
align with sb log offsets
On 2022-05-17 14:42, David Sterba wrote:
> On Mon, May 16, 2022 at 06:54:11PM +0200, Pankaj Raghav wrote:
>> Superblocks for zoned devices are fixed as 2 zones at 0, 512GB and 4TB.
>> These are fixed at these locations so that recovery tools can reliably
>> retrieve the superblocks even if one of the mirror gets corrupted.
>>
>> power of 2 zone sizes align at these offsets irrespective of their
>> value but non power of 2 zone sizes will not align.
>>
>> To make sure the first zone at mirror 1 and mirror 2 align, write zero
>> operation is performed to move the write pointer of the first zone to
>> the expected offset. This operation is performed only after a zone reset
>> of the first zone, i.e., when the second zone that contains the sb is FULL.
>
> Is it a good idea to do the "write zeros", instead of a plain "set write
> pointer"? I assume setting write pointer is instant, while writing
> potentially hundreds of megabytes may take significiant time. As the
> functions may be called from random contexts, the increased time may
> become a problem.
>
Unfortunately it is not possible to just move the WP in zoned devices.
The only alternative that I could use is to do write zeroes which are
natively supported by some devices such as ZNS. It would be nice to know
if someone had a better solution to this instead of doing write zeroes
in zoned devices.
>> Signed-off-by: Pankaj Raghav <p.raghav@...sung.com>
>> ---
>> fs/btrfs/zoned.c | 68 ++++++++++++++++++++++++++++++++++++++++++++----
>> 1 file changed, 63 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
>> index 3023c871e..805aeaa76 100644
>> --- a/fs/btrfs/zoned.c
>> +++ b/fs/btrfs/zoned.c
>> @@ -760,11 +760,44 @@ int btrfs_check_mountopts_zoned(struct btrfs_fs_info *info)
>> return 0;
>> }
>>
>> +static int fill_sb_wp_offset(struct block_device *bdev, struct blk_zone *zone,
>> + int mirror, u64 *wp_ret)
>> +{
>> + u64 offset = 0;
>> + int ret = 0;
>> +
>> + ASSERT(!is_power_of_two_u64(zone->len));
>> + ASSERT(zone->wp == zone->start);
>> + ASSERT(mirror != 0);
>
> This could simply accept 0 as the mirror offset too, the calculation is
> trivial.
>
Ok. I will fix it up!
>> +
>> + switch (mirror) {
>> + case 1:
>> + div64_u64_rem(BTRFS_SB_LOG_FIRST_OFFSET >> SECTOR_SHIFT,
>> + zone->len, &offset);
>> + break;
>> + case 2:
>> + div64_u64_rem(BTRFS_SB_LOG_SECOND_OFFSET >> SECTOR_SHIFT,
>> + zone->len, &offset);
>> + break;
>> + }
>> +
>> + ret = blkdev_issue_zeroout(bdev, zone->start, offset, GFP_NOFS, 0);
>> + if (ret)
>> + return ret;
>> +
>> + /*
>> + * Non po2 zone sizes will not align naturally at
>> + * mirror 1 (512GB) and mirror 2 (4TB). The wp of the
>> + * 1st zone in those superblock mirrors need to be
>> + * moved to align at those offsets.
>> + */
>
> Please move this comment to the helper fill_sb_wp_offset itself, there
> it's more discoverable.
>
Ok.
>> + is_sb_offset_write_req =
>> + (zones_empty || (reset_zone_nr == 0)) && mirror &&
>> + !is_power_of_2(zones[0].len);
>
> Accepting 0 as the mirror number would also get rid of this wild
> expression substituting and 'if'.
>
>>
>> if (reset && reset->cond != BLK_ZONE_COND_EMPTY) {
>> ASSERT(sb_zone_is_full(reset));
>> @@ -795,6 +846,13 @@ static int sb_log_location(struct block_device *bdev, struct blk_zone *zones,
>> reset->cond = BLK_ZONE_COND_EMPTY;
>> reset->wp = reset->start;
>> }
>> +
>> + if (is_sb_offset_write_req) {
>
> And get rid of the conditional. The point of supporting both po2 and
> nonpo2 is to hide any implementation details to wrappers as much as
> possible.
>
Alright. I will move the logic to the wrapper instead of having the
conditional in this function.
>> + ret = fill_sb_wp_offset(bdev, &zones[0], mirror, &wp);
>> + if (ret)
>> + return ret;
>> + }
>> +
>> } else if (ret != -ENOENT) {
>> /*
>> * For READ, we want the previous one. Move write pointer to
Thanks for your comments.
Powered by blists - more mailing lists