linux-kernel - Re: [v3 1/6] mm, oom: use oom_victims counter to synchronize oom victim selection

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170622165858.GA30035@castle>
Date:   Thu, 22 Jun 2017 17:58:58 +0100
From:   Roman Gushchin <guro@...com>
To:     Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
CC:     <linux-mm@...ck.org>, Michal Hocko <mhocko@...nel.org>,
        Vladimir Davydov <vdavydov.dev@...il.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Tejun Heo <tj@...nel.org>, <kernel-team@...com>,
        <cgroups@...r.kernel.org>, <linux-doc@...r.kernel.org>,
        <linux-kernel@...r.kernel.org>
Subject: Re: [v3 1/6] mm, oom: use oom_victims counter to synchronize oom
 victim selection

On Thu, Jun 22, 2017 at 09:40:28AM +0900, Tetsuo Handa wrote:
> Roman Gushchin wrote:
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -992,6 +992,13 @@ bool out_of_memory(struct oom_control *oc)
> >  	if (oom_killer_disabled)
> >  		return false;
> >  
> > +	/*
> > +	 * If there are oom victims in flight, we don't need to select
> > +	 * a new victim.
> > +	 */
> > +	if (atomic_read(&oom_victims) > 0)
> > +		return true;
> > +
> >  	if (!is_memcg_oom(oc)) {
> >  		blocking_notifier_call_chain(&oom_notify_list, 0, &freed);
> >  		if (freed > 0)
> 
> Above in this patch and below in patch 5 are wrong.
> 
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -665,7 +672,13 @@ static void mark_oom_victim(struct task_struct *tsk)
> >  	 * that TIF_MEMDIE tasks should be ignored.
> >  	 */
> >  	__thaw_task(tsk);
> > -	atomic_inc(&oom_victims);
> > +
> > +	/*
> > +	 * If there are no oom victims in flight,
> > +	 * give the task an access to the memory reserves.
> > +	 */
> > +	if (atomic_inc_return(&oom_victims) == 1)
> > +		set_tsk_thread_flag(tsk, TIF_MEMDIE);
> >  }
> >  
> >  /**
> 
> The OOM reaper is not available for CONFIG_MMU=n kernels, and timeout based
> giveup is not permitted, but a multithreaded process might be selected as
> an OOM victim. Not setting TIF_MEMDIE to all threads sharing an OOM victim's
> mm increases possibility of preventing some OOM victim thread from terminating
> (e.g. one of them cannot leave __alloc_pages_slowpath() with mmap_sem held for
> write due to waiting for the TIF_MEMDIE thread to call exit_oom_victim() when
> the TIF_MEMDIE thread is waiting for the thread with mmap_sem held for write).

I agree, that CONFIG_MMU=n is a special case, and the proposed approach can't
be used directly. But can you, please, why do you find the first  chunk wrong?
Basically, it's exactly what we do now: we increment oom_victims for every oom
victim, and every process decrements this counter during it's exit path.
If there is at least one existing victim, we will select it again, so it's just
an optimization. Am I missing something? Why should we start new victim selection
if there processes that will likely quit and release memory soon?

Thanks!