linux-kernel - Re: [patch] mm, oom: prevent soft lockup on memcg oom for UP systems

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200316093152.GE11482@dhcp22.suse.cz>
Date:   Mon, 16 Mar 2020 10:31:52 +0100
From:   Michal Hocko <mhocko@...nel.org>
To:     Andrew Morton <akpm@...ux-foundation.org>
Cc:     David Rientjes <rientjes@...gle.com>,
        Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
        Vlastimil Babka <vbabka@...e.cz>, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org
Subject: Re: [patch] mm, oom: prevent soft lockup on memcg oom for UP systems

On Thu 12-03-20 15:32:38, Andrew Morton wrote:
> On Thu, 12 Mar 2020 11:07:15 -0700 (PDT) David Rientjes <rientjes@...gle.com> wrote:
> 
> > On Thu, 12 Mar 2020, Tetsuo Handa wrote:
> > 
> > > > On Thu, 12 Mar 2020, Tetsuo Handa wrote:
> > > > > > If you have an alternate patch to try, we can test it.  But since this 
> > > > > > cond_resched() is needed anyway, I'm not sure it will change the result.
> > > > > 
> > > > > schedule_timeout_killable(1) is an alternate patch to try; I don't think
> > > > > that this cond_resched() is needed anyway.
> > > > > 
> > > > 
> > > > You are suggesting schedule_timeout_killable(1) in shrink_node_memcgs()?
> > > > 
> > > 
> > > Andrew Morton also mentioned whether cond_resched() in shrink_node_memcgs()
> > > is enough. But like you mentioned,
> > > 
> > 
> > It passes our testing because this is where the allocator is looping while 
> > the victim is trying to exit if only it could be scheduled.
> 
> What happens if the allocator has SCHED_FIFO?

The same thing as a SCHED_FIFO running in a tight loop in the userspace.

As long as a high priority context depends on a resource held by a low
priority task then we have a priority inversion problem and the page
allocator is no real exception here. But I do not see the allocator
is much different from any other code in the kernel. We do not add
random sleeps here and there to push a high priority FIFO or RT tasks
out of the execution context. We do cond_resched to help !PREEMPT
kernels but priority related issues are really out of scope of that
facility.
-- 
Michal Hocko
SUSE Labs