linux-kernel - Re: [patch] mm, oom: prevent soft lockup on memcg oom for UP systems

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.21.2003121101030.158939@chino.kir.corp.google.com>
Date:   Thu, 12 Mar 2020 11:07:15 -0700 (PDT)
From:   David Rientjes <rientjes@...gle.com>
To:     Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        Michal Hocko <mhocko@...nel.org>, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org
Subject: Re: [patch] mm, oom: prevent soft lockup on memcg oom for UP
 systems

On Thu, 12 Mar 2020, Tetsuo Handa wrote:

> > On Thu, 12 Mar 2020, Tetsuo Handa wrote:
> > > > If you have an alternate patch to try, we can test it.  But since this 
> > > > cond_resched() is needed anyway, I'm not sure it will change the result.
> > > 
> > > schedule_timeout_killable(1) is an alternate patch to try; I don't think
> > > that this cond_resched() is needed anyway.
> > > 
> > 
> > You are suggesting schedule_timeout_killable(1) in shrink_node_memcgs()?
> > 
> 
> Andrew Morton also mentioned whether cond_resched() in shrink_node_memcgs()
> is enough. But like you mentioned,
> 

It passes our testing because this is where the allocator is looping while 
the victim is trying to exit if only it could be scheduled.

> you can try re-adding sleep outside of oom_lock:
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index d09776cd6e10..3aee7e0eca4e 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1576,6 +1576,7 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
>  	 */
>  	ret = should_force_charge() || out_of_memory(&oc);
>  	mutex_unlock(&oom_lock);
> +	schedule_timeout_killable(1);
>  	return ret;
>  }
>  

If current was process chosen for oom kill, this would actually induce the 
problem, not fix it.

> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3c4eb750a199..e80158049651 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3797,7 +3797,6 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
>  	 */
>  	if (!mutex_trylock(&oom_lock)) {
>  		*did_some_progress = 1;
> -		schedule_timeout_uninterruptible(1);
>  		return NULL;
>  	}
>  
> @@ -4590,6 +4589,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  
>  	/* Retry as long as the OOM killer is making progress */
>  	if (did_some_progress) {
> +		schedule_timeout_uninterruptible(1);
>  		no_progress_loops = 0;
>  		goto retry;
>  	}
> 
> By the way, will you share the reproducer (and how to use the reproducer) ?
> 

On an UP kernel with swap disabled, you limit a memcg to 100MB and start 
three processes that each fault 40MB attached to it.  Same reproducer as 
the "mm, oom: make a last minute check to prevent unnecessary memcg oom 
kills" patch except in that case there are two cores.