linux-kernel - Re: [patch] mm, oom: prevent soft lockup on memcg oom for UP systems

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.21.2003101724010.197777@chino.kir.corp.google.com>
Date:   Tue, 10 Mar 2020 17:34:53 -0700 (PDT)
From:   David Rientjes <rientjes@...gle.com>
To:     Andrew Morton <akpm@...ux-foundation.org>
cc:     Vlastimil Babka <vbabka@...e.cz>, Michal Hocko <mhocko@...nel.org>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [patch] mm, oom: prevent soft lockup on memcg oom for UP
 systems

On Tue, 10 Mar 2020, Andrew Morton wrote:

> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -2637,6 +2637,8 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
> >  		unsigned long reclaimed;
> >  		unsigned long scanned;
> >  
> > +		cond_resched();
> > +
> >  		switch (mem_cgroup_protected(target_memcg, memcg)) {
> >  		case MEMCG_PROT_MIN:
> >  			/*
> 
> 
> Obviously better, but this will still spin wheels until this tasks's
> timeslice expires, and we might want to do something to help ensure
> that the victim runs next (or soon)?
> 

We used to have a schedule_timeout_killable(1) to address exactly that 
scenario but it was removed in 4.19:

commit 9bfe5ded054b8e28a94c78580f233d6879a00146
Author: Michal Hocko <mhocko@...e.com>
Date:   Fri Aug 17 15:49:04 2018 -0700

    mm, oom: remove sleep from under oom_lock

This is why we don't see this issue on 4.14 guests but we do on 4.19.  I 
had assumed the issue Tetsuo reported that resulted in that patch was 
still an issue and I preferred to fix the weird UP issue by adding a 
cond_resched() that is likely needed for the iteration in 
shrink_node_memcg() anyway.  Do we care to optimize for UP systems 
encountering memcg oom kills?  Eh, maybe, but I'm not very interested in 
opening up a centithread about this.

> (And why is shrink_node_memcgs compiled in when CONFIG_MEMCG=n?)
> 

This guest does have CONFIG_MEMCG enabled, it's a memcg oom condition.

But unrelated to this patch, I think it's just a weird naming for it.  The 
do-while loop in shrink_node_memcgs() actually uses memcg = NULL for the 
non-memcg case and is responsible for calling into page and slab reclaim.