linux-kernel - Re: [PATCH RFC PATCH 2/3] btrfs: zoned: reserve relocation zone on mount

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ozmewhhflwko2z75luj33futfnkhoryyhk7vvb76suuagqse7o@gdwyk3dj3yw7>
Date: Tue, 2 Apr 2024 06:03:35 +0000
From: Naohiro Aota <Naohiro.Aota@....com>
To: Damien Le Moal <dlemoal@...nel.org>
CC: Johannes Thumshirn <jth@...nel.org>, Chris Mason <clm@...com>, Josef Bacik
	<josef@...icpanda.com>, David Sterba <dsterba@...e.com>,
	"linux-btrfs@...r.kernel.org" <linux-btrfs@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Hans Holmberg
	<Hans.Holmberg@....com>, "hch@....de" <hch@....de>, Boris Burkov
	<boris@....io>, Johannes Thumshirn <Johannes.Thumshirn@....com>
Subject: Re: [PATCH RFC PATCH 2/3] btrfs: zoned: reserve relocation zone on
 mount

On Fri, Mar 29, 2024 at 08:05:34AM +0900, Damien Le Moal wrote:
> On 3/28/24 22:56, Johannes Thumshirn wrote:
> > From: Johannes Thumshirn <johannes.thumshirn@....com>
> > 
> > Reserve one zone as a data relocation target on each mount. If we already
> > find one empty block group, there's no need to force a chunk allocation,
> > but we can use this empty data block group as our relocation target.
> > 
> > Signed-off-by: Johannes Thumshirn <johannes.thumshirn@....com>
> > ---
> >  fs/btrfs/disk-io.c |  2 ++
> >  fs/btrfs/zoned.c   | 46 ++++++++++++++++++++++++++++++++++++++++++++++
> >  fs/btrfs/zoned.h   |  3 +++
> >  3 files changed, 51 insertions(+)
> > 
> > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> > index 5a35c2c0bbc9..83b56f109d29 100644
> > --- a/fs/btrfs/disk-io.c
> > +++ b/fs/btrfs/disk-io.c
> > @@ -3550,6 +3550,8 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
> >  	}
> >  	btrfs_discard_resume(fs_info);
> >  
> > +	btrfs_reserve_relocation_zone(fs_info);
> > +
> >  	if (fs_info->uuid_root &&
> >  	    (btrfs_test_opt(fs_info, RESCAN_UUID_TREE) ||
> >  	     fs_info->generation != btrfs_super_uuid_tree_generation(disk_super))) {
> > diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
> > index d51faf7f4162..fb8707f4cab5 100644
> > --- a/fs/btrfs/zoned.c
> > +++ b/fs/btrfs/zoned.c
> > @@ -17,6 +17,7 @@
> >  #include "fs.h"
> >  #include "accessors.h"
> >  #include "bio.h"
> > +#include "transaction.h"
> >  
> >  /* Maximum number of zones to report per blkdev_report_zones() call */
> >  #define BTRFS_REPORT_NR_ZONES   4096
> > @@ -2634,3 +2635,48 @@ void btrfs_check_active_zone_reservation(struct btrfs_fs_info *fs_info)
> >  	}
> >  	spin_unlock(&fs_info->zone_active_bgs_lock);
> >  }
> > +
> > +static u64 find_empty_block_group(struct btrfs_space_info *sinfo)
> > +{
> > +	struct btrfs_block_group *bg;
> > +
> > +	for (int i = 0; i < BTRFS_NR_RAID_TYPES; i++) {
> > +		list_for_each_entry(bg, &sinfo->block_groups[i], list) {
> > +			if (bg->used == 0)
> > +				return bg->start;
> > +		}
> > +	}
> > +
> > +	return 0;
> 
> The first block group does not start at offset 0 ? If it does, then this is not
> correct. Maybe turn this function into returning a bool and add a pointer
> argument to return the bg start value ?

No, it does not. The bg->start (logical address) increases monotonically as
a new block group is created. And, the first block group created by
mkfs.btrfs has a non-zero bg->start address.

> > +}
> > +
> > +void btrfs_reserve_relocation_zone(struct btrfs_fs_info *fs_info)
> > +{
> > +	struct btrfs_root *tree_root = fs_info->tree_root;
> > +	struct btrfs_space_info *sinfo = fs_info->data_sinfo;
> > +	struct btrfs_trans_handle *trans;
> > +	u64 flags = btrfs_get_alloc_profile(fs_info, sinfo->flags);
> > +	u64 bytenr = 0;
> > +
> > +	if (!btrfs_is_zoned(fs_info))
> > +		return;
> > +
> > +	bytenr = find_empty_block_group(sinfo);
> > +	if (!bytenr) {
> > +		int ret;
> > +
> > +		trans = btrfs_join_transaction(tree_root);
> > +		if (IS_ERR(trans))
> > +			return;
> > +
> > +		ret = btrfs_chunk_alloc(trans, flags, CHUNK_ALLOC_FORCE);
> > +		btrfs_end_transaction(trans);
> > +
> > +		if (!ret)
> > +			bytenr = find_empty_block_group(sinfo);
> 
> What if this fail again ?

That (almost) means we don't have any free space to create a new block
group. Then, we don't have a way to deal with it. Maybe, we can reclaim
directly here?

Anyway, in that case, we will set fs_info->data_reloc_bg = 0, which is the
same behavior as the current code.

> > +	}
> > +
> > +	spin_lock(&fs_info->relocation_bg_lock);
> > +	fs_info->data_reloc_bg = bytenr;
> > +	spin_unlock(&fs_info->relocation_bg_lock);
> > +}
> > diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h
> > index 77c4321e331f..048ffada4549 100644
> > --- a/fs/btrfs/zoned.h
> > +++ b/fs/btrfs/zoned.h
> > @@ -97,6 +97,7 @@ int btrfs_zone_finish_one_bg(struct btrfs_fs_info *fs_info);
> >  int btrfs_zoned_activate_one_bg(struct btrfs_fs_info *fs_info,
> >  				struct btrfs_space_info *space_info, bool do_finish);
> >  void btrfs_check_active_zone_reservation(struct btrfs_fs_info *fs_info);
> > +void btrfs_reserve_relocation_zone(struct btrfs_fs_info *fs_info);
> >  #else /* CONFIG_BLK_DEV_ZONED */
> >  static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos,
> >  				     struct blk_zone *zone)
> > @@ -271,6 +272,8 @@ static inline int btrfs_zoned_activate_one_bg(struct btrfs_fs_info *fs_info,
> >  
> >  static inline void btrfs_check_active_zone_reservation(struct btrfs_fs_info *fs_info) { }
> >  
> > +static inline void btrfs_reserve_relocation_zone(struct btrfs_fs_info *fs_info) { }
> > +
> >  #endif
> >  
> >  static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos)
> > 
> 
> -- 
> Damien Le Moal
> Western Digital Research
>