[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250401074710.2657-1-yohan.joung@sk.com>
Date: Tue, 1 Apr 2025 16:47:07 +0900
From: "yohan.joung" <yohan.joung@...com>
To: linux-f2fs-devel@...ts.sourceforge.net
Cc: chao@...nel.org,
jaegeuk@...nel.org,
jyh429@...il.com,
linux-kernel@...r.kernel.org,
pilhyun.kim@...com,
yohan.joung@...com
Subject: [External Mail] Re: [f2fs-dev] [External Mail] Re: [External Mail] Re: [PATCH] f2fs: prevent the current section from being selected as a victim during garbage collection
>-----Original Message-----
>From: Chao Yu <chao@...nel.org>
>Sent: Tuesday, April 1, 2025 2:42 PM
>To: 정요한(JOUNG YOHAN) Mobile AE <yohan.joung@...com>; linux-f2fs-
>devel@...ts.sourceforge.net
>Cc: chao@...nel.org; jaegeuk@...nel.org; jyh429@...il.com; linux-
>kernel@...r.kernel.org; 김필현(KIM PILHYUN) Mobile AE <pilhyun.kim@...com>
>Subject: [External Mail] Re: [External Mail] Re: [f2fs-dev] [External Mail]
>Re: [External Mail] Re: [PATCH] f2fs: prevent the current section from
>being selected as a victim during garbage collection
>
>On 4/1/25 09:51, yohan.joung wrote:
>>> From: Chao Yu <chao@...nel.org>
>>> Sent: Monday, March 31, 2025 8:36 PM
>>> To: 정요한(JOUNG YOHAN) Mobile AE <yohan.joung@...com>; linux-f2fs-
>>> devel@...ts.sourceforge.net
>>> Cc: chao@...nel.org; jaegeuk@...nel.org; jyh429@...il.com; linux-
>>> kernel@...r.kernel.org; 김필현(KIM PILHYUN) Mobile AE
>>> <pilhyun.kim@...com>
>>> Subject: [External Mail] Re: [External Mail] Re: [f2fs-dev] [External
>>> Mail]
>>> Re: [External Mail] Re: [PATCH] f2fs: prevent the current section
>>> from being selected as a victim during garbage collection
>>>
>>> On 3/31/25 13:13, yohan.joung wrote:
>>>>> On 2025/3/28 15:25, yohan.joung wrote:
>>>>>>> On 2025/3/28 11:40, yohan.joung wrote:
>>>>>>>>> From: Chao Yu <chao@...nel.org>
>>>>>>>>> Sent: Thursday, March 27, 2025 10:48 PM
>>>>>>>>> To: 정요한(JOUNG YOHAN) Mobile AE <yohan.joung@...com>; Yohan
>>>>>>>>> Joung <jyh429@...il.com>; jaegeuk@...nel.org; daeho43@...il.com
>>>>>>>>> Cc: chao@...nel.org; linux-f2fs-devel@...ts.sourceforge.net;
>>>>>>>>> linux- kernel@...r.kernel.org; 김필현(KIM PILHYUN) Mobile AE
>>>>>>>>> <pilhyun.kim@...com>
>>>>>>>>> Subject: [External Mail] Re: [External Mail] Re: [External Mail]
>Re:
>>>>>>>>> [PATCH] f2fs: prevent the current section from being selected
>>>>>>>>> as a victim during garbage collection
>>>>>>>>>
>>>>>>>>> On 2025/3/27 16:00, yohan.joung@...com wrote:
>>>>>>>>>>> From: Chao Yu <chao@...nel.org>
>>>>>>>>>>> Sent: Thursday, March 27, 2025 4:30 PM
>>>>>>>>>>> To: 정요한(JOUNG YOHAN) Mobile AE <yohan.joung@...com>; Yohan
>>>>>>>>>>> Joung <jyh429@...il.com>; jaegeuk@...nel.org;
>>>>>>>>>>> daeho43@...il.com
>>>>>>>>>>> Cc: chao@...nel.org; linux-f2fs-devel@...ts.sourceforge.net;
>>>>>>>>>>> linux- kernel@...r.kernel.org; 김필현(KIM PILHYUN) Mobile AE
>>>>>>>>>>> <pilhyun.kim@...com>
>>>>>>>>>>> Subject: [External Mail] Re: [External Mail] Re: [PATCH] f2fs:
>>>>>>>>>>> prevent the current section from being selected as a victim
>>>>>>>>>>> during garbage collection
>>>>>>>>>>>
>>>>>>>>>>> On 3/27/25 14:43, yohan.joung@...com wrote:
>>>>>>>>>>>>> From: Chao Yu <chao@...nel.org>
>>>>>>>>>>>>> Sent: Thursday, March 27, 2025 3:02 PM
>>>>>>>>>>>>> To: Yohan Joung <jyh429@...il.com>; jaegeuk@...nel.org;
>>>>>>>>>>>>> daeho43@...il.com
>>>>>>>>>>>>> Cc: chao@...nel.org;
>>>>>>>>>>>>> linux-f2fs-devel@...ts.sourceforge.net;
>>>>>>>>>>>>> linux- kernel@...r.kernel.org; 정요한(JOUNG YOHAN) Mobile AE
>>>>>>>>>>>>> <yohan.joung@...com>
>>>>>>>>>>>>> Subject: [External Mail] Re: [PATCH] f2fs: prevent the
>>>>>>>>>>>>> current section from being selected as a victim during
>>>>>>>>>>>>> garbage collection
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 3/26/25 22:14, Yohan Joung wrote:
>>>>>>>>>>>>>> When selecting a victim using next_victim_seg in a large
>>>>>>>>>>>>>> section, the selected section might already have been
>>>>>>>>>>>>>> cleared and designated as the new current section, making
>>>>>>>>>>>>>> it actively in
>>>>>>> use.
>>>>>>>>>>>>>> This behavior causes inconsistency between the SIT and SSA.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi, does this fix your issue?
>>>>>>>>>>>>
>>>>>>>>>>>> This is an issue that arises when dividing a large section
>>>>>>>>>>>> into segments for garbage collection.
>>>>>>>>>>>> caused by the background GC (garbage collection) thread in
>>>>>>>>>>>> large section
>>>>>>>>>>>> f2fs_gc(victim_section) ->
>>>>>>>>>>>> f2fs_clear_prefree_segments(victim_section)->
>>>>>>>>>>>> cursec(victim_section) -> f2fs_gc(victim_section by
>>>>>>>>>>>> next_victim_seg)
>>>>>>>>>>>
>>>>>>>>>>> I didn't get it, why f2fs_get_victim() will return section
>>>>>>>>>>> which is used by curseg? It should be avoided by checking w/
>>>>> sec_usage_check().
>>>>>>>>>>>
>>>>>>>>>>> Or we missed to check gcing section which next_victim_seg
>>>>>>>>>>> points to during get_new_segment()?
>>>>>>>>>>>
>>>>>>>>>>> Can this happen?
>>>>>>>>>>>
>>>>>>>>>>> e.g.
>>>>>>>>>>> - bggc selects sec #0
>>>>>>>>>>> - next_victim_seg: seg #0
>>>>>>>>>>> - migrate seg #0 and stop
>>>>>>>>>>> - next_victim_seg: seg #1
>>>>>>>>>>> - checkpoint, set sec #0 free if sec #0 has no valid blocks
>>>>>>>>>>> - allocate seg #0 in sec #0 for curseg
>>>>>>>>>>> - curseg moves to seg #1 after allocation
>>>>>>>>>>> - bggc tries to migrate seg #1
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>> That's correct
>>>>>>>>>> In f2fs_get_victim, we use next_victim_seg to directly jump to
>>>>>>>>>> got_result, thereby bypassing sec_usage_check What do you
>>>>>>>>>> think about this change?
>>>>>>>>>>
>>>>>>>>>> @@ -850,15 +850,20 @@ int f2fs_get_victim(struct f2fs_sb_info
>>>>>>>>>> *sbi,
>>>>>>>>> unsigned int *result,
>>>>>>>>>> p.min_segno = sbi->next_victim_seg[BG_GC];
>>>>>>>>>> *result = p.min_segno;
>>>>>>>>>> sbi->next_victim_seg[BG_GC] = NULL_SEGNO;
>>>>>>>>>> - goto got_result;
>>>>>>>>>> }
>>>>>>>>>> if (gc_type == FG_GC &&
>>>>>>>>>> sbi->next_victim_seg[FG_GC]
>>>>>>>>>> != NULL_SEGNO)
>>>>> {
>>>>>>>>>> p.min_segno = sbi->next_victim_seg[FG_GC];
>>>>>>>>>> *result = p.min_segno;
>>>>>>>>>> sbi->next_victim_seg[FG_GC] = NULL_SEGNO;
>>>>>>>>>> - goto got_result;
>>>>>>>>>> }
>>>>>>>>>> +
>>>>>>>>>> + secno = GET_SEC_FROM_SEG(sbi, segno);
>>>>>>>>>> +
>>>>>>>>>> + if (sec_usage_check(sbi, secno))
>>>>>>>>>> + goto next;
>>>>>>>>>> +
>>>>>>>>>> + goto got_result;
>>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> But still allocator can assign this segment after
>>>>>>>>> sec_usage_check() in race condition, right?
>>>>>>>> Since the BG GC using next_victim takes place after the SIT
>>>>>>>> update in do_checkpoint, it seems unlikely that a race condition
>>>>>>>> with
>>>>>>> sec_usage_check will occur.
>>>>>>>
>>>>>>> I mean this:
>>>>>>>
>>>>>>> - gc_thread
>>>>>>> - f2fs_gc
>>>>>>> - f2fs_get_victim
>>>>>>> - sec_usage_check --- segno #1 is not used in any cursegs
>>>>>>> - f2fs_allocate_data_block
>>>>>>> - new_curseg
>>>>>>> - get_new_segment find segno #1
>>>>>>>
>>>>>>> - do_garbage_collect
>>>>>>>
>>>>>>> Thanks,
>>>>>>
>>>>>> do_checkpoint sec0 free
>>>>>> If sec0 is not freed, then
>>>>> segno1 within sec0 cannot be
>>>>>> allocated
>>>>>> - gc_thread
>>>>>> - f2fs_gc
>>>>>> - f2fs_get_victim
>>>>>> - sec_usage_check --- segno #1 is not used in any cursegs
>>>>>> (but
>>>>>> sec0
>>>>> is already used)
>>>>>> - f2fs_allocate_data_block
>>>>>> - new_curseg
>>>>>> - get_new_segment find
>>>>> segno #1
>>>>>>
>>>>>> - do_garbage_collect
>>>>>>
>>>>>> I appreciate your patch, it is under testing.
>>>>>> but I'm wondering if there's a risk of a race condition in this
>>>>>> situation
>>>>>
>>>>> Oh, yes, I may missed that get_new_segment can return a free
>>>>> segment in partial used section.
>>>>>
>>>>> So what do you think of this?
>>>>> - check CURSEG() in do_garbage_collect() and get_victim()
>>>>> - reset next_victim_seg[] in get_new_segment() and
>>>>> __set_test_and_free() during checkpoint.
>>>>>
>>>>> Thanks,
>>>>
>>>> How about using victim_secmap?
>>>> gc_thread
>>>> mutex_lock(&DIRTY_I(sbi)->seglist_lock);
>>>> __set_test_and_free
>>>> check cur section next_victim clear
>>>> mutex_unlock(&dirty_i->seglist_lock);
>>>>
>>>> mutex_lock(&dirty->seglist_lock);
>>>> f2fs_get_victim
>>>> mutex_unlock(&dirty_i->seglist_lock);
>>>>
>>>> static inline void __set_test_and_free(struct f2fs_sb_info *sbi,
>>>> if (next >= start_segno + usable_segs) {
>>>> if (test_and_clear_bit(secno, free_i->free_secmap))
>>>> free_i->free_sections++;
>>>> +
>>>> + if (test_and_clear_bit(secno, dirty_i-
>>victim_secmap))
>>>> + sbi->next_victim_seg[BG_GC] =
>>>> + NULL_SEGNO;
>>>
>>> Can this happen?
>>>
>>> segs_per_sec=2
>>>
>>> - seg#0 and seg#1 are all dirty
>>> - all valid blocks are removed in seg#1
>>> - checkpoint -> seg#1 becomes free
>>> - gc select this sec and next_victim_seg=seg#0
>>> - migrate seg#0, next_victim_seg=seg#1
>>> - allocator assigns seg#1 to curseg
>>> - gc tries to migrate seg#1
>
>I meant for above case, below change still can not catch it, right? since
>next_victim_seg[] was assigned after checkpoint.
>
>+ if (test_and_clear_bit(secno, dirty_i->victim_secmap))
>+ sbi->next_victim_seg[BG_GC] = NULL_SEGNO;
>
>Thanks,
>
I understood what you said.
It seems that without a checkpoint,
we won't be able to allocate seg#1 in sec0
because when requesting a segment allocation,
if it's not within the same section, it checks in a new secmap.
please let me know, if there's anything I've missed
Thanks
static int get_new_segment(struct f2fs_sb_info *sbi,
unsigned int *newseg, bool new_sec, bool pinning)
if (!new_sec && ((*newseg + 1) % SEGS_PER_SEC(sbi))) {
segno = find_next_zero_bit(free_i->free_segmap,
GET_SEG_FROM_SEC(sbi, hint + 1), *newseg + 1);
if (segno < GET_SEG_FROM_SEC(sbi, hint + 1))
goto got_it;
}
find_other_zone:
secno = find_next_zero_bit(free_i->free_secmap, MAIN_SECS(sbi), hint);
>>>
>>> Thanks,
>> The detailed scenario
>> segs_per_sec=2
>> - seg#0 and seg#1 are all dirty
>> - all valid blocks are removed in seg#1
>> - gc select this sec and next_victim_seg=seg#0
>> - migrate seg#0, next_victim_seg=seg#1
>> - checkpoint -> sec(seg#0, seg#1) becomes free
>> - allocator assigns sec(seg#0, seg#1) to curseg
>> - gc tries to migrate seg#1
>>>
>>>> }
>>>> }
>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>>>
>>>>>>>>> IMO, we can clear next_victim_seg[] once section is free in
>>>>>>>>> __set_test_and_free()? something like this:
>>>>>>>> I will test it according to your suggestion.
>>>>>>>> If there are no issues, can I submit it again with the patch?
>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>> ---
>>>>>>>>> fs/f2fs/segment.h | 13 ++++++++++---
>>>>>>>>> 1 file changed, 10 insertions(+), 3 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index
>>>>>>>>> 0465dc00b349..826e37999085 100644
>>>>>>>>> --- a/fs/f2fs/segment.h
>>>>>>>>> +++ b/fs/f2fs/segment.h
>>>>>>>>> @@ -473,9 +473,16 @@ static inline void
>>>>>>>>> __set_test_and_free(struct f2fs_sb_info *sbi,
>>>>>>>>> goto skip_free;
>>>>>>>>> next = find_next_bit(free_i->free_segmap,
>>>>>>>>> start_segno + SEGS_PER_SEC(sbi),
>>>>> start_segno);
>>>>>>>>> - if (next >= start_segno + usable_segs) {
>>>>>>>>> - if (test_and_clear_bit(secno, free_i-
>>>> free_secmap))
>>>>>>>>> - free_i->free_sections++;
>>>>>>>>> + if ((next >= start_segno + usable_segs) &&
>>>>>>>>> + test_and_clear_bit(secno, free_i->free_secmap))
>>> {
>>>>>>>>> + free_i->free_sections++;
>>>>>>>>> +
>>>>>>>>> + if (GET_SEC_FROM_SEG(sbi->next_victim_seg[BG_GC])
>>> ==
>>>>>>>>> + secno)
>>>>>>>>> + sbi->next_victim_seg[BG_GC] = NULL_SEGNO;
>>>>>>>>> + if (GET_SEC_FROM_SEG(sbi->next_victim_seg[FG_GC])
>>> ==
>>>>>>>>> + secno)
>>>>>>>>> + sbi->next_victim_seg[FG_GC] = NULL_SEGNO;
>>>>>>>>> }
>>>>>>>>> }
>>>>>>>>> skip_free:
>>>>>>>>> --
>>>>>>>>> 2.40.1
>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Because the call stack is different, I think that in order
>>>>>>>>>>>> to handle everything at once, we need to address it within
>>>>>>>>>>>> do_garbage_collect, or otherwise include it on both sides.
>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>
>>>>>>>>>>>> [30146.337471][ T1300] F2FS-fs (dm-54): Inconsistent segment
>>>>>>>>>>>> (70961) type [0, 1] in SSA and SIT [30146.346151][ T1300]
>>>>>>>>>>>> Call
>>>>> trace:
>>>>>>>>>>>> [30146.346152][ T1300] dump_backtrace+0xe8/0x10c
>>>>>>>>>>>> [30146.346157][ T1300] show_stack+0x18/0x28 [30146.346158][
>>>>>>>>>>>> T1300] dump_stack_lvl+0x50/0x6c [30146.346161][ T1300]
>>>>>>>>>>>> dump_stack+0x18/0x28 [30146.346162][ T1300]
>>>>>>>>>>>> f2fs_stop_checkpoint+0x1c/0x3c [30146.346165][ T1300]
>>>>>>>>>>>> do_garbage_collect+0x41c/0x271c [30146.346167][ T1300]
>>>>>>>>>>>> f2fs_gc+0x27c/0x828 [30146.346168][ T1300]
>>>>>>>>>>>> gc_thread_func+0x290/0x88c [30146.346169][ T1300]
>>>>>>>>>>>> kthread+0x11c/0x164 [30146.346172][ T1300]
>>>>>>>>>>>> ret_from_fork+0x10/0x20
>>>>>>>>>>>>
>>>>>>>>>>>> struct curseg_info : 0xffffff803f95e800 {
>>>>>>>>>>>> segno : 0x11531 : 70961
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> struct f2fs_sb_info : 0xffffff8811d12000 {
>>>>>>>>>>>> next_victim_seg[0] : 0x11531 : 70961 }
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://lore.kernel.org/linux-f2fs-devel/20250325080646.329
>>>>>>>>>>>>> 19
>>>>>>>>>>>>> 47
>>>>>>>>>>>>> -2
>>>>>>>>>>>>> -
>>>>>>>>>>>>> chao@...nel.org
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Signed-off-by: Yohan Joung <yohan.joung@...com>
>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>> fs/f2fs/gc.c | 4 ++++
>>>>>>>>>>>>>> 1 file changed, 4 insertions(+)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index
>>>>>>>>>>>>>> 2b8f9239bede..4b5d18e395eb 100644
>>>>>>>>>>>>>> --- a/fs/f2fs/gc.c
>>>>>>>>>>>>>> +++ b/fs/f2fs/gc.c
>>>>>>>>>>>>>> @@ -1926,6 +1926,10 @@ int f2fs_gc(struct f2fs_sb_info
>>>>>>>>>>>>>> *sbi, struct
>>>>>>>>>>>>> f2fs_gc_control *gc_control)
>>>>>>>>>>>>>> goto stop;
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> + if (__is_large_section(sbi) &&
>>>>>>>>>>>>>> + IS_CURSEC(sbi, GET_SEC_FROM_SEG(sbi,
>segno)))
>>>>>>>>>>>>>> + goto stop;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> seg_freed = do_garbage_collect(sbi, segno, &gc_list,
>>> gc_type,
>>>>>>>>>>>>>> gc_control->should_migrate_blocks,
>>>>>>>>>>>>>> gc_control->one_time);
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>
>>
Powered by blists - more mailing lists