linux-kernel - Re: [RFC PATCH 2/2] mm, oom: do not trigger out_of

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170612073922.GA7476@dhcp22.suse.cz>
Date:   Mon, 12 Jun 2017 09:39:22 +0200
From:   Michal Hocko <mhocko@...nel.org>
To:     Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
Cc:     hannes@...xchg.org, akpm@...ux-foundation.org, guro@...com,
        vdavydov.dev@...il.com, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 2/2] mm, oom: do not trigger out_of_memory from the#PF

On Sat 10-06-17 20:57:46, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > And just to clarify a bit. The OOM killer should be invoked whenever
> > appropriate from the allocation context. If we decide to fail the
> > allocation in the PF path then we can safely roll back and retry the
> > whole PF. This has an advantage that any locks held while doing the
> > allocation will be released and that alone can help to make a further
> > progress. Moreover we can relax retry-for-ever _inside_ the allocator
> > semantic for the PF path and fail allocations when we cannot make
> > further progress even after we hit the OOM condition or we do stall for
> > too long.
> 
> What!? Are you saying that leave the allocator loop rather than invoke
> the OOM killer if it is from page fault event without __GFP_FS set?
> With below patch applied (i.e. ignore __GFP_FS for emulation purpose),
> I can trivially observe systemwide lockup where the OOM killer is
> never called.

Because you have ruled the OOM out of the game completely from the PF
path AFICS. So that is clearly _not_ what I meant (read the second
sentence). What I meant was that page fault allocations _could_ fail
_after_ we have used _all_ the reclaim opportunities. Without this patch
this would be impossible. Note that I am not proposing that change now
because that would require a deeper audit but it sounds like a viable
way to go long term.

> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index b896897..c79dfd5 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3255,6 +3255,9 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...)
>  
>  	*did_some_progress = 0;
>  
> +	if (current->in_pagefault)
> +		return NULL;
> +
>  	/*
>  	 * Acquire the oom lock.  If that fails, somebody else is
>  	 * making progress for us.
-- 
Michal Hocko
SUSE Labs