[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <39c4ded0-09c0-3e38-85cb-5535099b177d@tcl.com>
Date: Thu, 14 Apr 2022 09:54:58 +0800
From: Wu Yan <wu-yan@....com>
To: Jaegeuk Kim <jaegeuk@...nel.org>
CC: <linux-f2fs-devel@...ts.sourceforge.net>,
<linux-kernel@...r.kernel.org>, <tang.ding@....com>
Subject: Re: [PATCH] f2fs: avoid deadlock in gc thread under low memory
On 4/14/22 01:00, Jaegeuk Kim wrote:
> On 04/13, Rokudo Yan wrote:
>> There is a potential deadlock in gc thread may happen
>> under low memory as below:
>>
>> gc_thread_func
>> -f2fs_gc
>> -do_garbage_collect
>> -gc_data_segment
>> -move_data_block
>> -set_page_writeback(fio.encrypted_page);
>> -f2fs_submit_page_write
>> as f2fs_submit_page_write try to do io merge when possible, so the
>> encrypted_page is marked PG_writeback but may not submit to block
>> layer immediately, if system enter low memory when gc thread try
>> to move next data block, it may do direct reclaim and enter fs layer
>> as below:
>> -move_data_block
>> -f2fs_grab_cache_page(index=?, for_write=false)
>> -grab_cache_page
>> -find_or_create_page
>> -pagecache_get_page
>> -__page_cache_alloc -- __GFP_FS is set
>> -alloc_pages_node
>> -__alloc_pages
>> -__alloc_pages_slowpath
>> -__alloc_pages_direct_reclaim
>> -__perform_reclaim
>> -try_to_free_pages
>> -do_try_to_free_pages
>> -shrink_zones
>> -mem_cgroup_soft_limit_reclaim
>> -mem_cgroup_soft_reclaim
>> -mem_cgroup_shrink_node
>> -shrink_node_memcg
>> -shrink_list
>> -shrink_inactive_list
>> -shrink_page_list
>> -wait_on_page_writeback -- the page is marked
>> writeback during previous move_data_block call
>>
>> the gc thread wait for the encrypted_page writeback complete,
>> but as gc thread held sbi->gc_lock, the writeback & sync thread
>> may blocked waiting for sbi->gc_lock, so the bio contain the
>> encrypted_page may nerver submit to block layer and complete the
>> writeback, which cause deadlock. To avoid this deadlock condition,
>> we mark the gc thread with PF_MEMALLOC_NOFS flag, then it will nerver
>> enter fs layer when try to alloc cache page during move_data_block.
>>
>> Signed-off-by: Rokudo Yan <wu-yan@....com>
>> ---
>> fs/f2fs/gc.c | 6 ++++++
>> 1 file changed, 6 insertions(+)
>>
>> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
>> index e020804f7b07..cc71f77b98c8 100644
>> --- a/fs/f2fs/gc.c
>> +++ b/fs/f2fs/gc.c
>> @@ -38,6 +38,12 @@ static int gc_thread_func(void *data)
>>
>> wait_ms = gc_th->min_sleep_time;
>>
>> + /*
>> + * Make sure that no allocations from gc thread will ever
>> + * recurse to the fs layer to avoid deadlock as it will
>> + * hold sbi->gc_lock during garbage collection
>> + */
>> + memalloc_nofs_save();
>
> I think this cannot cover all the f2fs_gc() call cases. Can we just avoid by:
>
> --- a/fs/f2fs/gc.c
> +++ b/fs/f2fs/gc.c
> @@ -1233,7 +1233,7 @@ static int move_data_block(struct inode *inode, block_t bidx,
> CURSEG_ALL_DATA_ATGC : CURSEG_COLD_DATA;
>
> /* do not read out */
> - page = f2fs_grab_cache_page(inode->i_mapping, bidx, false);
> + page = f2fs_grab_cache_page(inode->i_mapping, bidx, true);
> if (!page)
> return -ENOMEM;
>
> Thanks,
>
>> set_freezable();
>> do {
>> bool sync_mode, foreground = false;
>> --
>> 2.25.1
Hi, Jaegeuk
I'm not sure if any other case may trigger the issue, but the stack
traces I have caught so far are all the same as below:
f2fs_gc-253:12 D 226966.808196 572 302561 150976 0x1200840 0x0 572
237207473347056
<ffffff889d88668c> __switch_to+0x134/0x150
<ffffff889e764b6c> __schedule+0xd5c/0x1100
<ffffff889e76554c> io_schedule+0x90/0xc0
<ffffff889d9fb880> wait_on_page_bit+0x194/0x208
<ffffff889da167b4> shrink_page_list+0x62c/0xe74
<ffffff889da1d354> shrink_inactive_list+0x2c0/0x698
<ffffff889da181f4> shrink_node_memcg+0x3dc/0x97c
<ffffff889da17d44> mem_cgroup_shrink_node+0x144/0x218
<ffffff889da6610c> mem_cgroup_soft_limit_reclaim+0x188/0x47c
<ffffff889da17a40> do_try_to_free_pages+0x204/0x3a0
<ffffff889da176c8> try_to_free_pages+0x35c/0x4d0
<ffffff889da05d60> __alloc_pages_nodemask+0x7a4/0x10d0
<ffffff889d9fc82c> pagecache_get_page+0x184/0x2ec
<ffffff889dbf8860> do_garbage_collect+0xfe0/0x2828
<ffffff889dbf7434> f2fs_gc+0x4a0/0x8ec
<ffffff889dbf6bf4> gc_thread_func+0x240/0x4d4
<ffffff889d8de9b0> kthread+0x17c/0x18c
<ffffff889d88567c> ret_from_fork+0x10/0x18
Thanks
yanwu
Powered by blists - more mailing lists