lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20240221013119epcms2p28bf15af46e7a330b0f9049abb4a2cc32@epcms2p2>
Date: Wed, 21 Feb 2024 10:31:19 +0900
From: Yonggil Song <yonggil.song@...sung.com>
To: Daeho Jeong <daeho43@...il.com>
CC: "jaegeuk@...nel.org" <jaegeuk@...nel.org>, "chao@...nel.org"
	<chao@...nel.org>, "linux-f2fs-devel@...ts.sourceforge.net"
	<linux-f2fs-devel@...ts.sourceforge.net>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, Dongjin Kim <dongjin_.kim@...sung.com>,
	Daejun Park <daejun7.park@...sung.com>, Siwoo Jung <siu.jung@...sung.com>
Subject: RE:(2) [f2fs-dev] [PATCH v6] f2fs: New victim selection for GC

> On Tue, Feb 13, 2024 at 5:36 PM Yonggil Song <yonggil.song@...sung.com> wrote:
> >
> >
> > Overview
> > ========
> >
> > This patch introduces a new way to preference data sections when selecting
> > GC victims. Migration of data blocks causes invalidation of node blocks.
> > Therefore, in situations where GC is frequent, selecting data blocks as
> > victims can reduce unnecessary block migration by invalidating node blocks.
> 
> Your approach will allocate new node blocks despite invalidating
> current node blocks while moving data blocks, though. While your
> approach may work well relating to WAF in a specific scenario, such as
> randomly overwriting an entire storage space with a huge file, it is
> important to consider its general applicability. For example, how
> about the test performance? Performance optimization should encompass
> a wide range of user scenarios. However, I am not convinced that this
> is the most efficient solution for most users. Can you provide more
> information about how your approach addresses the performance needs of
> a broader spectrum of user scenarios?
> 

Thank you for your review and feedback. I agree with your opinion.
I'll research and develop this approach for the user scenario.

> > For exceptional situations where free sections are insufficient, node blocks
> > are selected as victims instead of data blocks to get extra free sections.
> >
> > Problem
> > =======
> >
> > If the total amount of nodes is larger than the size of one section, nodes
> > occupy multiple sections, and node victims are often selected because the
> > gc cost is lowered by data block migration in GC. Since moving the data
> > section causes frequent node victim selection, victim threshing occurs in
> > the node section. This results in an increase in WAF.
> >
> > Experiment
> > ==========
> >
> > Test environment is as follows.
> >
> >         System info
> >           - 3.6GHz, 16 core CPU
> >           - 36GiB Memory
> >         Device info
> >           - a conventional null_blk with 228MiB
> >           - a sequential null_blk with 4068 zones of 8MiB
> >         Format
> >           - mkfs.f2fs <conv null_blk> -c <seq null_blk> -m -Z 8 -o 3.89
> >         Mount
> >           - mount <conv null_blk> <mount point>
> >         Fio script
> >           - fio --rw=randwrite --bs=4k --ba=4k --filesize=31187m --norandommap --overwrite=1 --name=job1 --filename=./mnt/sustain --io_size=128g
> >         WAF calculation
> >           - (IOs on conv. null_blk + IOs on seq. null_blk) / random write IOs
> >
> > Conclusion
> > ==========
> >
> > This experiment showed that the WAF was reduced by 29% (18.75 -> 13.3) when
> > the data section was selected first when selecting GC victims. This was
> > achieved by reducing the migration of the node blocks by 69.4%
> > (253,131,743 blks -> 77,463,278 blks). It is possible to achieve low WAF
> > performance with the GC victim selection method in environments where the
> > section size is relatively small.
> >
> > Signed-off-by: Yonggil Song <yonggil.song@...sung.com>
> > ---
> >  fs/f2fs/f2fs.h |  1 +
> >  fs/f2fs/gc.c   | 96 +++++++++++++++++++++++++++++++++++++++-----------
> >  fs/f2fs/gc.h   |  6 ++++
> >  3 files changed, 82 insertions(+), 21 deletions(-)
> >
> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > index 65294e3b0bef..b129f62ba541 100644
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -1654,6 +1654,7 @@ struct f2fs_sb_info {
> >         struct f2fs_mount_info mount_opt;       /* mount options */
> >
> >         /* for cleaning operations */
> > +       bool require_node_gc;                   /* flag for node GC */
> >         struct f2fs_rwsem gc_lock;              /*
> >                                                  * semaphore for GC, avoid
> >                                                  * race between GC and GC or CP
> > diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> > index a079eebfb080..53a51a668567 100644
> > --- a/fs/f2fs/gc.c
> > +++ b/fs/f2fs/gc.c
> > @@ -341,6 +341,14 @@ static unsigned int get_cb_cost(struct f2fs_sb_info *sbi, unsigned int segno)
> >         unsigned int i;
> >         unsigned int usable_segs_per_sec = f2fs_usable_segs_in_sec(sbi, segno);
> >
> > +       /*
> > +        * When BG_GC selects victims based on age, it prevents node victims
> > +        * from being selected. This is because node blocks can be invalidated
> > +        * by moving data blocks.
> > +        */
> > +       if (__skip_node_gc(sbi, segno))
> > +               return UINT_MAX;
> > +
> >         for (i = 0; i < usable_segs_per_sec; i++)
> >                 mtime += get_seg_entry(sbi, start + i)->mtime;
> >         vblocks = get_valid_blocks(sbi, segno, true);
> > @@ -369,10 +377,24 @@ static inline unsigned int get_gc_cost(struct f2fs_sb_info *sbi,
> >                 return get_seg_entry(sbi, segno)->ckpt_valid_blocks;
> >
> >         /* alloc_mode == LFS */
> > -       if (p->gc_mode == GC_GREEDY)
> > -               return get_valid_blocks(sbi, segno, true);
> > -       else if (p->gc_mode == GC_CB)
> > +       if (p->gc_mode == GC_GREEDY) {
> > +               /*
> > +                * If the data block that the node block pointed to is GCed,
> > +                * the node block is invalidated. For this reason, we add a
> > +                * weight to cost of node victims to give priority to data
> > +                * victims during the gc process. However, in a situation
> > +                * where we run out of free sections, we remove the weight
> > +                * because we need to clean up node blocks.
> > +                */
> > +               unsigned int weight = 0;
> > +
> > +               if (__skip_node_gc(sbi, segno))
> > +                       weight = BLKS_PER_SEC(sbi);
> > +
> > +               return get_valid_blocks(sbi, segno, true) + weight;
> > +       } else if (p->gc_mode == GC_CB) {
> >                 return get_cb_cost(sbi, segno);
> > +       }
> >
> >         f2fs_bug_on(sbi, 1);
> >         return 0;
> > @@ -557,6 +579,14 @@ static void atgc_lookup_victim(struct f2fs_sb_info *sbi,
> >         if (ve->mtime >= max_mtime || ve->mtime < min_mtime)
> >                 goto skip;
> >
> > +       /*
> > +        * When BG_GC selects victims based on age, it prevents node victims
> > +        * from being selected. This is because node blocks can be invalidated
> > +        * by moving data blocks.
> > +        */
> > +       if (__skip_node_gc(sbi, ve->segno))
> > +               goto skip;
> > +
> >         /* age = 10000 * x% * 60 */
> >         age = div64_u64(accu * (max_mtime - ve->mtime), total_time) *
> >                                                                 age_weight;
> > @@ -1827,8 +1857,27 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct f2fs_gc_control *gc_control)
> >                 goto stop;
> >         }
> >
> > +       __get_secs_required(sbi, NULL, &upper_secs, NULL);
> > +
> > +       /*
> > +        * Write checkpoint to reclaim prefree segments.
> > +        * We need more three extra sections for writer's data/node/dentry.
> > +        */
> > +       if (free_sections(sbi) <= upper_secs + NR_GC_CHECKPOINT_SECS) {
> > +               sbi->require_node_gc = true;
> > +
> > +               if (prefree_segments(sbi)) {
> > +                       stat_inc_cp_call_count(sbi, TOTAL_CALL);
> > +                       ret = f2fs_write_checkpoint(sbi, &cpc);
> > +                       if (ret)
> > +                               goto stop;
> > +                       /* Reset due to checkpoint */
> > +                       sec_freed = 0;
> > +               }
> > +       }
> > +
> >         /* Let's run FG_GC, if we don't have enough space. */
> > -       if (has_not_enough_free_secs(sbi, 0, 0)) {
> > +       if (gc_type == BG_GC && has_not_enough_free_secs(sbi, 0, 0)) {
> >                 gc_type = FG_GC;
> >
> >                 /*
> > @@ -1863,6 +1912,18 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct f2fs_gc_control *gc_control)
> >                 goto stop;
> >         }
> >
> > +       if (sbi->require_node_gc &&
> > +           IS_DATASEG(get_seg_entry(sbi, segno)->type)) {
> > +               /*
> > +                * We need to clean node sections. but, data victim
> > +                * cost is the lowest. If free sections are enough,
> > +                * stop cleaning node victim. If not, it goes on
> > +                * by GCing data victims.
> > +                */
> > +               if (has_enough_free_secs(sbi, sec_freed, 0))
> > +                       goto stop;
> > +       }
> > +
> >         seg_freed = do_garbage_collect(sbi, segno, &gc_list, gc_type,
> >                                 gc_control->should_migrate_blocks);
> >         if (seg_freed < 0)
> > @@ -1882,7 +1943,13 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct f2fs_gc_control *gc_control)
> >                         if (!gc_control->no_bg_gc &&
> >                             total_sec_freed < gc_control->nr_free_secs)
> >                                 goto go_gc_more;
> > -                       goto stop;
> > +                       /*
> > +                        * If require_node_gc flag is set even though there
> > +                        * are enough free sections, node cleaning will
> > +                        * continue.
> > +                        */
> > +                       if (!sbi->require_node_gc)
> > +                               goto stop;
> >                 }
> >                 if (sbi->skipped_gc_rwsem)
> >                         skipped_round++;
> > @@ -1897,21 +1964,6 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct f2fs_gc_control *gc_control)
> >                 goto stop;
> >         }
> >
> > -       __get_secs_required(sbi, NULL, &upper_secs, NULL);
> > -
> > -       /*
> > -        * Write checkpoint to reclaim prefree segments.
> > -        * We need more three extra sections for writer's data/node/dentry.
> > -        */
> > -       if (free_sections(sbi) <= upper_secs + NR_GC_CHECKPOINT_SECS &&
> > -                               prefree_segments(sbi)) {
> > -               stat_inc_cp_call_count(sbi, TOTAL_CALL);
> > -               ret = f2fs_write_checkpoint(sbi, &cpc);
> > -               if (ret)
> > -                       goto stop;
> > -               /* Reset due to checkpoint */
> > -               sec_freed = 0;
> > -       }
> >  go_gc_more:
> >         segno = NULL_SEGNO;
> >         goto gc_more;
> > @@ -1920,8 +1972,10 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct f2fs_gc_control *gc_control)
> >         SIT_I(sbi)->last_victim[ALLOC_NEXT] = 0;
> >         SIT_I(sbi)->last_victim[FLUSH_DEVICE] = gc_control->victim_segno;
> >
> > -       if (gc_type == FG_GC)
> > +       if (gc_type == FG_GC) {
> >                 f2fs_unpin_all_sections(sbi, true);
> > +               sbi->require_node_gc = false;
> > +       }
> >
> >         trace_f2fs_gc_end(sbi->sb, ret, total_freed, total_sec_freed,
> >                                 get_pages(sbi, F2FS_DIRTY_NODES),
> > diff --git a/fs/f2fs/gc.h b/fs/f2fs/gc.h
> > index 28a00942802c..cd07bf125177 100644
> > --- a/fs/f2fs/gc.h
> > +++ b/fs/f2fs/gc.h
> > @@ -166,3 +166,9 @@ static inline bool has_enough_invalid_blocks(struct f2fs_sb_info *sbi)
> >                 free_user_blocks(sbi) <
> >                         limit_free_user_blocks(invalid_user_blocks));
> >  }
> > +
> > +static inline bool __skip_node_gc(struct f2fs_sb_info *sbi, unsigned int segno)
> > +{
> > +       return (IS_NODESEG(get_seg_entry(sbi, segno)->type) &&
> > +               !sbi->require_node_gc);
> > +}
> > --
> > 2.34.1
> >
> >
> >
> > _______________________________________________
> > Linux-f2fs-devel mailing list
> > Linux-f2fs-devel@...ts.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ