linux-kernel - Re: [PATCH] f2fs: avoid deadlock in gc thread under low memory

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Yla1Z8Ze0iJvXRFT@dhcp22.suse.cz>
Date:   Wed, 13 Apr 2022 13:35:03 +0200
From:   Michal Hocko <mhocko@...e.com>
To:     Wu Yan <wu-yan@....com>
Cc:     jaegeuk@...nel.org, linux-f2fs-devel@...ts.sourceforge.net,
        linux-kernel@...r.kernel.org, tang.ding@....com
Subject: Re: [PATCH] f2fs: avoid deadlock in gc thread under low memory

On Wed 13-04-22 19:20:06, Wu Yan wrote:
> On 4/13/22 17:46, Michal Hocko wrote:
> > On Wed 13-04-22 16:44:32, Rokudo Yan wrote:
> > > There is a potential deadlock in gc thread may happen
> > > under low memory as below:
> > > 
> > > gc_thread_func
> > >   -f2fs_gc
> > >    -do_garbage_collect
> > >     -gc_data_segment
> > >      -move_data_block
> > >       -set_page_writeback(fio.encrypted_page);
> > >       -f2fs_submit_page_write
> > > as f2fs_submit_page_write try to do io merge when possible, so the
> > > encrypted_page is marked PG_writeback but may not submit to block
> > > layer immediately, if system enter low memory when gc thread try
> > > to move next data block, it may do direct reclaim and enter fs layer
> > > as below:
> > >     -move_data_block
> > >      -f2fs_grab_cache_page(index=?, for_write=false)
> > >       -grab_cache_page
> > >        -find_or_create_page
> > >         -pagecache_get_page
> > >          -__page_cache_alloc --  __GFP_FS is set
> > >           -alloc_pages_node
> > >            -__alloc_pages
> > >             -__alloc_pages_slowpath
> > >              -__alloc_pages_direct_reclaim
> > >               -__perform_reclaim
> > >                -try_to_free_pages
> > >                 -do_try_to_free_pages
> > >                  -shrink_zones
> > >                   -mem_cgroup_soft_limit_reclaim
> > >                    -mem_cgroup_soft_reclaim
> > >                     -mem_cgroup_shrink_node
> > >                      -shrink_node_memcg
> > >                       -shrink_list
> > >                        -shrink_inactive_list
> > >                         -shrink_page_list
> > >                          -wait_on_page_writeback -- the page is marked
> > >                         writeback during previous move_data_block call
> > 
> > This is a memcg reclaim path and you would have to have __GFP_ACCOUNT in
> > the gfp mask to hit it from the page allocator. I am not really familiar
> > with f2fs but I doubt it is using this flag.
> > 
> > On the other hand the memory is charged to a memcg when the newly
> > allocated page is added to the page cache. That wouldn't trigger the
> > soft reclaim path but that is not really necessary because even the
> > regular memcg reclaim would trigger wait_on_page_writeback for cgroup
> > v1.
> > 
> > Also are you sure that the mapping's gfp mask has __GFP_FS set for this
> > allocation? f2fs_iget uses GFP_NOFS like mask for some inode types.
> > 
> > All that being said, you will need to change the above call chain but it
> > would be worth double checking the dead lock is real.
> 
> Hi, Michal
> 
> 1. The issue is occur when do monkey test in Android Device with 4GB RAM +
> 3GB zram, and memory cgroup v1 enabled.
> 
> 2. full memory dump has caught when the issue occur and the dead lock has
> confirmed from dump. We can see the mapping->gfp_mask is 0x14200ca,
> so both __GFP_ACCOUNT(0x1000000) and __GFP_FS(0x80) set

This is rather surprising, I have to say because page cache is charged
explicitly (__filemap_add_folio). Are you testing with the upstream
kernel or could this be a non-upstream change possibly?

> crash-arm64> struct inode.i_mapping 0xFFFFFFDFD578EEA0
>   i_mapping = 0xffffffdfd578f028,
> crash-arm64> struct address_space.host,gfp_mask -x 0xffffffdfd578f028
>   host = 0xffffffdfd578eea0,
>   gfp_mask = 0x14200ca,

Anyway, if the __GFP_FS is set then the deadlock is possible even
without __GFP_ACCOUNT.
-- 
Michal Hocko
SUSE Labs