linux-kernel - Re: [f2fs-dev] [PATCH] f2fs: Fix indefinite loop in f2fs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <609a502b-1e7f-c9b2-e864-421ffeda298b@huawei.com>
Date:   Tue, 30 Jul 2019 20:35:46 +0800
From:   Chao Yu <yuchao0@...wei.com>
To:     Sahitya Tummala <stummala@...eaurora.org>,
        Chao Yu <chao@...nel.org>
CC:     Jaegeuk Kim <jaegeuk@...nel.org>,
        <linux-f2fs-devel@...ts.sourceforge.net>,
        <linux-kernel@...r.kernel.org>
Subject: Re: [f2fs-dev] [PATCH] f2fs: Fix indefinite loop in f2fs_gc()

Hi Sahitya,

On 2019/7/30 12:36, Sahitya Tummala wrote:
> Hi Chao,
> 
> On Tue, Jul 30, 2019 at 12:00:45AM +0800, Chao Yu wrote:
>> Hi Sahitya,
>>
>> On 2019-7-29 13:20, Sahitya Tummala wrote:
>>> Policy - foreground GC, LFS mode and greedy GC mode.
>>>
>>> Under this policy, f2fs_gc() loops forever to GC as it doesn't have
>>> enough free segements to proceed and thus it keeps calling gc_more
>>> for the same victim segment.  This can happen if the selected victim
>>> segment could not be GC'd due to failed blkaddr validity check i.e.
>>> is_alive() returns false for the blocks set in current validity map.
>>>
>>> Fix this by not resetting the sbi->cur_victim_sec to NULL_SEGNO, when
>>> the segment selected could not be GC'd. This helps to select another
>>> segment for GC and thus helps to proceed forward with GC.
>>>
>>> Signed-off-by: Sahitya Tummala <stummala@...eaurora.org>
>>> ---
>>>  fs/f2fs/gc.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
>>> index 8974672..7bbcc4a 100644
>>> --- a/fs/f2fs/gc.c
>>> +++ b/fs/f2fs/gc.c
>>> @@ -1303,7 +1303,7 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync,
>>>  		round++;
>>>  	}
>>>  
>>> -	if (gc_type == FG_GC)
>>> +	if (gc_type == FG_GC && seg_freed)
>>>  		sbi->cur_victim_sec = NULL_SEGNO;
>>
>> In some cases, we may remain last victim in sbi->cur_victim_sec, and jump out of
>> GC cycle, then SSR can skip the last victim due to sec_usage_check()...
>>
> 
> I see. I have a few questions on how to fix this issue. Please share your
> comments.
> 
> 1. Do you think the scenario described is valid? It happens rarely, not very

IIRC, we suffered endless gc loop due to there is valid block belong to an
opened atomic write file. (because we will skip directly once we hit atomic file)

For your case, I'm not sure that would happen, did you look into is_alive(), why
will it fail? block address not match? If so, it looks like summary info and
dnode block and nat entry are inconsistent.

> easy to reproduce.  From the dumps, I see that only block is set as valid in
> the sentry->cur_valid_map for which I see that summary block check is_alive()
> could return false. As only one block is set as valid, chances are there it
> can be always selected as the victim by get_victim_by_default() under FG_GC.
> 
> 2. What are the possible scenarios where summary block check is_alive() could
> fail for a segment?

I guess, maybe after check_valid_map(), the block is been truncated before
is_alive(). If so the victim should be prefree directly instead of being
selected again...

> 
> 3. How does GC handle such segments?

I think that's not a normal case, or I'm missing something.

Thanks,

> 
> Thanks,
> 
>> Thanks,
>>
>>>  
>>>  	if (sync)
>>>
>