linux-kernel - Re: [PATCH v2] f2fs: New victim selection for GC

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZXeRkAoQTtTpf1dg@google.com>
Date:   Mon, 11 Dec 2023 14:47:44 -0800
From:   Jaegeuk Kim <jaegeuk@...nel.org>
To:     Yonggil Song <yonggil.song@...sung.com>
Cc:     "chao@...nel.org" <chao@...nel.org>,
        "linux-f2fs-devel@...ts.sourceforge.net" 
        <linux-f2fs-devel@...ts.sourceforge.net>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Seokhwan Kim <sukka.kim@...sung.com>,
        Daejun Park <daejun7.park@...sung.com>,
        Siwoo Jung <siu.jung@...sung.com>
Subject: Re: [PATCH v2] f2fs: New victim selection for GC

On 12/08, Yonggil Song wrote:
> Overview
> ========
> 
> This patch introduces a new way to preference data sections when selecting
> GC victims. Migration of data blocks causes invalidation of node blocks.
> Therefore, in situations where GC is frequent, selecting data blocks as
> victims can reduce unnecessary block migration by invalidating node blocks.
> For exceptional situations where free sections are insufficient, node blocks
> are selected as victims instead of data blocks to get extra free sections.
> 
> Problem
> =======
> 
> If the total amount of nodes is larger than the size of one section, nodes
> occupy multiple sections, and node victims are often selected because the
> gc cost is lowered by data block migration in GC. Since moving the data
> section causes frequent node victim selection, victim threshing occurs in
> the node section. This results in an increase in WAF.
> 
> Experiment
> ==========
> 
> Test environment is as follows.
> 
>         System info
>           - 3.6GHz, 16 core CPU
>           - 36GiB Memory
>         Device info
>           - a conventional null_blk with 228MiB
>           - a sequential null_blk with 4068 zones of 8MiB
>         Format
>           - mkfs.f2fs <conv null_blk> -c <seq null_blk> -m -Z 8 -o 3.89
>         Mount
>           - mount <conv null_blk> <mount point>
>         Fio script
> 	  - fio --rw=randwrite --bs=4k --ba=4k --filesize=31187m --norandommap --overwrite=1 --name=job1 --filename=./mnt/sustain --io_size=128g
> 	WAF calculation
> 	  - (IOs on conv. null_blk + IOs on seq. null_blk) / random write IOs
> 
> Conclusion
> ==========
> 
> This experiment showed that the WAF was reduced by 29% (18.75 -> 13.3) when
> the data section was selected first when selecting GC victims. This was
> achieved by reducing the migration of the node blocks by 69.4%
> (253,131,743 blks -> 77,463,278 blks). It is possible to achieve low WAF
> performance with the GC victim selection method in environments where the
> section size is relatively small.
> 
> Signed-off-by: Yonggil Song <yonggil.song@...sung.com>
> ---
>  fs/f2fs/f2fs.h |  1 +
>  fs/f2fs/gc.c   | 98 ++++++++++++++++++++++++++++++++++++++------------
>  2 files changed, 77 insertions(+), 22 deletions(-)
> 
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 9043cedfa12b..578d57f6022f 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -1649,6 +1649,7 @@ struct f2fs_sb_info {
>  	struct f2fs_mount_info mount_opt;	/* mount options */
>  
>  	/* for cleaning operations */
> +	bool need_node_clean;			/* need to clean dirty nodes */
>  	struct f2fs_rwsem gc_lock;		/*
>  						 * semaphore for GC, avoid
>  						 * race between GC and GC or CP
> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> index f550cdeaa663..682dcf0de59e 100644
> --- a/fs/f2fs/gc.c
> +++ b/fs/f2fs/gc.c
> @@ -368,6 +368,14 @@ static inline unsigned int get_gc_cost(struct f2fs_sb_info *sbi,
>  	if (p->alloc_mode == SSR)
>  		return get_seg_entry(sbi, segno)->ckpt_valid_blocks;
>  
> +	/*
> +	 * If we don't need to clean dirty nodes,
> +	 * we just skip node victims.
> +	 */
> +	if (IS_NODESEG(get_seg_entry(sbi, segno)->type) &&
> +		!sbi->need_node_clean)
> +		return get_max_cost(sbi, p);

How about differentiating the gc cost between data vs. node by adding some
weights? By default, data is preferred, while node is better in the worst case?

> +
>  	/* alloc_mode == LFS */
>  	if (p->gc_mode == GC_GREEDY)
>  		return get_valid_blocks(sbi, segno, true);
> @@ -557,6 +565,14 @@ static void atgc_lookup_victim(struct f2fs_sb_info *sbi,
>  	if (ve->mtime >= max_mtime || ve->mtime < min_mtime)
>  		goto skip;
>  
> +	/*
> +	 * If we don't need to clean dirty nodes,
> +	 * we just skip node victims.
> +	 */
> +	if (IS_NODESEG(get_seg_entry(sbi, ve->segno)->type) &&
> +	    !sbi->need_node_clean)
> +		goto skip;
> +
>  	/* age = 10000 * x% * 60 */
>  	age = div64_u64(accu * (max_mtime - ve->mtime), total_time) *
>  								age_weight;
> @@ -913,7 +929,21 @@ int f2fs_get_victim(struct f2fs_sb_info *sbi, unsigned int *result,
>  		goto retry;
>  	}
>  
> +
>  	if (p.min_segno != NULL_SEGNO) {
> +		if (sbi->need_node_clean &&
> +		    IS_DATASEG(get_seg_entry(sbi, p.min_segno)->type)) {
> +			 /*
> +			  * we need to clean node sections.
> +			  * but, data victim cost is the lowest.
> +			  * if free sections are enough, stop cleaning node victim.
> +			  * if not, it goes on by GCing data victims.
> +			  */
> +			if (has_enough_free_secs(sbi, prefree_segments(sbi), 0)) {
> +				p.min_segno = NULL_SEGNO;
> +				goto out;
> +			}
> +		}
>  got_it:
>  		*result = (p.min_segno / p.ofs_unit) * p.ofs_unit;
>  got_result:
> @@ -1830,8 +1860,27 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct f2fs_gc_control *gc_control)
>  		goto stop;
>  	}
>  
> +	__get_secs_required(sbi, NULL, &upper_secs, NULL);
> +
> +	/*
> +	 * Write checkpoint to reclaim prefree segments.
> +	 * We need more three extra sections for writer's data/node/dentry.
> +	 */
> +	if (free_sections(sbi) <= upper_secs + NR_GC_CHECKPOINT_SECS) {
> +		sbi->need_node_clean = true;
> +
> +		if (prefree_segments(sbi)) {
> +			stat_inc_cp_call_count(sbi, TOTAL_CALL);
> +			ret = f2fs_write_checkpoint(sbi, &cpc);
> +			if (ret)
> +				goto stop;
> +			/* Reset due to checkpoint */
> +			sec_freed = 0;
> +		}
> +	}
> +
>  	/* Let's run FG_GC, if we don't have enough space. */
> -	if (has_not_enough_free_secs(sbi, 0, 0)) {
> +	if (gc_type == BG_GC && has_not_enough_free_secs(sbi, 0, 0)) {
>  		gc_type = FG_GC;
>  
>  		/*
> @@ -1858,10 +1907,22 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct f2fs_gc_control *gc_control)
>  	ret = __get_victim(sbi, &segno, gc_type);
>  	if (ret) {
>  		/* allow to search victim from sections has pinned data */
> -		if (ret == -ENODATA && gc_type == FG_GC &&
> -				f2fs_pinned_section_exists(DIRTY_I(sbi))) {
> -			f2fs_unpin_all_sections(sbi, false);
> -			goto retry;
> +		if (ret == -ENODATA && gc_type == FG_GC) {
> +			if (f2fs_pinned_section_exists(DIRTY_I(sbi))) {
> +				f2fs_unpin_all_sections(sbi, false);
> +				goto retry;
> +			}
> +			/*
> +			 * If we have no more data victims, let's start to
> +			 * clean dirty nodes.
> +			 */
> +			if (!sbi->need_node_clean) {
> +				sbi->need_node_clean = true;
> +				goto retry;
> +			}
> +			/* node cleaning is over */
> +			else if (sbi->need_node_clean)
> +				goto stop;
>  		}
>  		goto stop;
>  	}
> @@ -1882,7 +1943,13 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct f2fs_gc_control *gc_control)
>  			if (!gc_control->no_bg_gc &&
>  			    total_sec_freed < gc_control->nr_free_secs)
>  				goto go_gc_more;
> -			goto stop;
> +			/*
> +			 * If the need_node_clean flag is set
> +			 * even though there are enough free
> +			 * sections, node cleaning will continue.
> +			 */
> +			if (!sbi->need_node_clean)
> +				goto stop;
>  		}
>  		if (sbi->skipped_gc_rwsem)
>  			skipped_round++;
> @@ -1897,21 +1964,6 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct f2fs_gc_control *gc_control)
>  		goto stop;
>  	}
>  
> -	__get_secs_required(sbi, NULL, &upper_secs, NULL);
> -
> -	/*
> -	 * Write checkpoint to reclaim prefree segments.
> -	 * We need more three extra sections for writer's data/node/dentry.
> -	 */
> -	if (free_sections(sbi) <= upper_secs + NR_GC_CHECKPOINT_SECS &&
> -				prefree_segments(sbi)) {
> -		stat_inc_cp_call_count(sbi, TOTAL_CALL);
> -		ret = f2fs_write_checkpoint(sbi, &cpc);
> -		if (ret)
> -			goto stop;
> -		/* Reset due to checkpoint */
> -		sec_freed = 0;
> -	}
>  go_gc_more:
>  	segno = NULL_SEGNO;
>  	goto gc_more;
> @@ -1920,8 +1972,10 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct f2fs_gc_control *gc_control)
>  	SIT_I(sbi)->last_victim[ALLOC_NEXT] = 0;
>  	SIT_I(sbi)->last_victim[FLUSH_DEVICE] = gc_control->victim_segno;
>  
> -	if (gc_type == FG_GC)
> +	if (gc_type == FG_GC) {
>  		f2fs_unpin_all_sections(sbi, true);
> +		sbi->need_node_clean = false;
> +	}
>  
>  	trace_f2fs_gc_end(sbi->sb, ret, total_freed, total_sec_freed,
>  				get_pages(sbi, F2FS_DIRTY_NODES),
> -- 
> 2.34.1