linux-kernel - Re: [RFC 1/4] mm, oom: do not rely on TIF

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Fri, 9 Sep 2016 16:00:21 +0200
From:   Michal Hocko <mhocko@...nel.org>
To:     Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
Cc:     linux-mm@...ck.org, rientjes@...gle.com, hannes@...xchg.org,
        akpm@...ux-foundation.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC 1/4] mm, oom: do not rely on TIF_MEMDIE for memory reserves
 access

On Sun 04-09-16 10:49:42, Tetsuo Handa wrote:
> Michal Hocko wrote:
[...]
> > @@ -3309,6 +3318,22 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
> >  	return alloc_flags;
> >  }
> >  
> > +static bool oom_reserves_allowed(struct task_struct *tsk)
> > +{
> > +	if (!tsk_is_oom_victim(tsk))
> > +		return false;
> > +
> > +	/*
> > +	 * !MMU doesn't have oom reaper so we shouldn't risk the memory reserves
> > +	 * depletion and shouldn't give access to memory reserves passed the
> > +	 * exit_mm
> > +	 */
> > +	if (!IS_ENABLED(CONFIG_MMU) && !tsk->mm)
> > +		return false;
> > +
> > +	return true;
> > +}
> > +
> 
> Are you aware that you are trying to make !MMU kernel's allocations not only
> after returning exit_mm() but also from __mmput() from mmput() from exit_mm()
> fail without allowing access to memory reserves?

Do we allocate from that path in !mmu and would that be more broken than
with the current code which clears TIF_MEMDIE after mmput even when
__mmput is not called (aka somebody is holding a reference to mm - e.g.
a proc file)?

> The comment says only after returning exit_mm(), but this change is
> not.

I can see that the comment is not ideal. Any suggestion how to make it
better?
 
> > @@ -3558,8 +3593,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> >  		goto nopage;
> >  	}
> >  
> > -	/* Avoid allocations with no watermarks from looping endlessly */
> > -	if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
> > +	/* Avoid allocations for oom victims from looping endlessly */
> > +	if (tsk_is_oom_victim(current) && !(gfp_mask & __GFP_NOFAIL))
> >  		goto nopage;
> 
> This change increases possibility of giving up without trying ALLOC_OOM
> (more allocation failure messages), for currently only one thread which
> remotely got TIF_MEMDIE when it was between gfp_to_alloc_flags() and
> test_thread_flag(TIF_MEMDIE) will give up without trying ALLOC_NO_WATERMARKS
> while all threads which remotely got current->signal->oom_mm when they were
> between gfp_to_alloc_flags() and test_thread_flag(TIF_MEMDIE) will give up
> without trying ALLOC_OOM. I think we should make sure that ALLOC_OOM is
> tried (by using a variable which remembers whether
> get_page_from_freelist(ALLOC_OOM) was tried).

Technically speaking you are right but I am not really sure that this
matters all that much. This code as always been racy. If we ever
consider the race harmfull we can reorganize the allo slow path in a way
to guarantee at least one allocation attempt with ALLOC_OOM I am just
not sure it is necessary right now. If this ever shows up as a problem
we would see a flood of allocation failures followed by the OOM report
so it would be quite easy to notice.

> We are currently allowing TIF_MEMDIE threads try ALLOC_NO_WATERMARKS for
> once and give up without invoking the OOM killer. This change makes
> current->signal->oom_mm threads try ALLOC_OOM for once and give up without
> invoking the OOM killer. This means that allocations for cleanly cleaning
> up by oom victims might fail prematurely, but we don't want to scatter
> around __GFP_NOFAIL. Since there are reasonable chances of the parallel
> memory freeing, we don't need to give up without invoking the OOM killer
> again. I think that
> 
> -	/* Avoid allocations with no watermarks from looping endlessly */
> -	if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
> +#ifndef CONFIG_MMU
> +	/* Avoid allocations for oom victims from looping endlessly */
> +	if (tsk_is_oom_victim(current) && !(gfp_mask & __GFP_NOFAIL))
> +		goto nopage;
> +#endif
> 
> is possible.

I would prefer to not spread out MMU ifdefs all over the place.

-- 
Michal Hocko
SUSE Labs