linux-kernel - Re: [patch] mm, thp: do not cause memcg oom for thp

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Wed, 21 Mar 2018 09:22:28 +0100
From:   Michal Hocko <mhocko@...nel.org>
To:     David Rientjes <rientjes@...gle.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Vlastimil Babka <vbabka@...e.cz>, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org
Subject: Re: [patch] mm, thp: do not cause memcg oom for thp

On Tue 20-03-18 13:25:23, David Rientjes wrote:
> On Tue, 20 Mar 2018, Michal Hocko wrote:
> 
> > > Commit 2516035499b9 ("mm, thp: remove __GFP_NORETRY from khugepaged and
> > > madvised allocations") changed the page allocator to no longer detect thp
> > > allocations based on __GFP_NORETRY.
> > > 
> > > It did not, however, modify the mem cgroup try_charge() path to avoid oom
> > > kill for either khugepaged collapsing or thp faulting.  It is never
> > > expected to oom kill a process to allocate a hugepage for thp; reclaim is
> > > governed by the thp defrag mode and MADV_HUGEPAGE, but allocations (and
> > > charging) should fallback instead of oom killing processes.
> > 
> > For some reason I thought that the charging path simply bails out for
> > costly orders - effectively the same thing as for the global OOM killer.
> > But we do not. Is there any reason to not do that though? Why don't we
> > simply do
> > 
> 
> I'm not sure of the expectation of high-order memcg charging without 
> __GFP_NORETRY,

It should be semantically compatible with the allocation path.

> I only know that khugepaged can now cause memcg oom kills 
> when trying to collapse memory, and then subsequently found that the same 
> situation exists for faulting instead of falling back to small pages.

And that is clearly a bug because page allocator doesn't oom kill while
the memcg charge does for the same gfp flag. That should be fixed.

> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index d1a917b5b7b7..08accbcd1a18 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -1493,7 +1493,7 @@ static void memcg_oom_recover(struct mem_cgroup *memcg)
> >  
> >  static void mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int order)
> >  {
> > -	if (!current->memcg_may_oom)
> > +	if (!current->memcg_may_oom || order > PAGE_ALLOC_COSTLY_ORDER)
> >  		return;
> >  	/*
> >  	 * We are in the middle of the charge context here, so we
> 
> That may make sense as an additional patch, but for thp allocations we 
> don't want to retry reclaim nr_retries times anyway; we want the old 
> behavior of __GFP_NORETRY before commit 2516035499b9.

Why? Allocation and the charge path should use the same gfp mask unless
there is a strong reason for it. If you have one then please mention it
in the changelog.

> So the above would be a follow-up patch that wouldn't replace mine.

Unless there is a strong reason to use different gfp mask for the
allocation and the charge then your fix is actually wrong.
-- 
Michal Hocko
SUSE Labs