linux-kernel - Re: [RFC PATCH 0/2] mm: fix OOMs for binding workloads to movable zone only node

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20201106074346.GA7247@dhcp22.suse.cz>
Date:   Fri, 6 Nov 2020 08:43:46 +0100
From:   Michal Hocko <mhocko@...e.com>
To:     "Huang, Ying" <ying.huang@...el.com>
Cc:     Feng Tang <feng.tang@...el.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Matthew Wilcox <willy@...radead.org>,
        Mel Gorman <mgorman@...e.de>, dave.hansen@...el.com,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 0/2] mm: fix OOMs for binding workloads to movable
 zone only node

On Fri 06-11-20 12:32:44, Huang, Ying wrote:
> Michal Hocko <mhocko@...e.com> writes:
> 
> > On Thu 05-11-20 09:40:28, Feng Tang wrote:
> >> On Wed, Nov 04, 2020 at 09:53:43AM +0100, Michal Hocko wrote:
> >>  
> >> > > > As I've said in reply to your second patch. I think we can make the oom
> >> > > > killer behavior more sensible in this misconfigured cases but I do not
> >> > > > think we want break the cpuset isolation for such a configuration.
> >> > > 
> >> > > Do you mean we skip the killing and just let the allocation fail? We've
> >> > > checked the oom killer code first, when the oom happens, both DRAM
> >> > > node and unmovable node have lots of free memory, and killing process
> >> > > won't improve the situation.
> >> > 
> >> > We already do skip oom killer and fail for lowmem allocation requests already.
> >> > This is similar in some sense. Another option would be to kill the
> >> > allocating context which will have less corner cases potentially because
> >> > some allocation failures might be unexpected.
> >> 
> >> Yes, this can avoid the helpless oom killing to kill a good process(no
> >> memory pressure at all)
> >> 
> >> And I think the important thing is to judge whether this usage (binding
> >> docker like workload to unmovable node) is a valid case :) 
> >
> > I am confused. Why wouldbe an unmovable node a problem. Movable
> > allocations can be satisfied from the Zone Normal just fine. It is other
> > way around that is a problem.
> >
> >> Initially, I thought it invalid too, but later think it still makes some
> >> sense for the 2 cases:
> >>     * user want to bind his workload to one node(most of user space
> >>       memory) to avoid cross-node traffic, and that node happens to
> >>       be configured as unmovable
> >
> > See above
> >
> >>     * one small DRAM node + big PMEM node, and memory latency insensitive
> >>       workload could be bound to the cheaper unmovable PMEM node
> >
> > Please elaborate some more. As long as you have movable and normal nodes
> > then this should be possible with a deal of care - most notably the
> > movable:kernel ratio memory shouldn't be too big.
> >
> > Besides that why does PMEM node have to be MOVABLE only in the first
> > place?
> 
> The performance of PMEM is much worse than that of DRAM.  If we found
> that some pages on PMEM are accessed frequently (hot), we may want to
> move them to DRAM to optimize the system performance.  If the unmovable
> pages are allocated on PMEM and hot, it's possible that we cannot move
> the pages to DRAM unless rebooting the system.  So we think we should
> make the PMEM nodes to be MOVABLE only.

That is fair but then you really need a fallback node too. So this is
mere optimization rather than a fundamental restriction.
-- 
Michal Hocko
SUSE Labs