lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201104084021.GB15700@shbuild999.sh.intel.com>
Date:   Wed, 4 Nov 2020 16:40:21 +0800
From:   Feng Tang <feng.tang@...el.com>
To:     Michal Hocko <mhocko@...e.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Matthew Wilcox <willy@...radead.org>,
        Mel Gorman <mgorman@...e.de>, dave.hansen@...el.com,
        ying.huang@...el.com, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 0/2] mm: fix OOMs for binding workloads to movable
 zone only node

On Wed, Nov 04, 2020 at 08:58:19AM +0100, Michal Hocko wrote:
> On Wed 04-11-20 15:38:26, Feng Tang wrote:
> [...]
> > > Could you be more specific about the usecase here? Why do you need a
> > > binding to a pure movable node? 
> > 
> > One common configuration for a platform is small size of DRAM plus huge
> > size of PMEM (which is slower but cheaper), and my guess of their use
> > is to try to lead the bulk of user space allocation (GFP_HIGHUSER_MOVABLE)
> > to PMEM node, and only let DRAM be used as less as possible. 
> 
> While this is possible, it is a tricky configuration. It is essentially 
> get us back to 32b and highmem...

:) Another possible case is similar binding on a memory hotplugable
platform, which has one unplugable node and several other nodes configured
as movable only to be hot removable when needed

> As I've said in reply to your second patch. I think we can make the oom
> killer behavior more sensible in this misconfigured cases but I do not
> think we want break the cpuset isolation for such a configuration.

Do you mean we skip the killing and just let the allocation fail? We've
checked the oom killer code first, when the oom happens, both DRAM
node and unmovable node have lots of free memory, and killing process
won't improve the situation.

(Folloing is copied from your comments for 2/2) 
> This allows to spill memory allocations over to any other node which
> has Normal (or other lower) zones and as such it breaks cpuset isolation.
> As I've pointed out in the reply to your cover letter it seems that
> this is more of a misconfiguration than a bug.

For the usage case (docker container running), the spilling is already
happening, I traced its memory allocation requests, many of them are
movable, and got fallback to the normal node naturally with current
code, only a few got blocked, as many of __alloc_pages_nodemask are
called witih 'NULL' nodemask parameter.

And I made this RFC patch inspired by code in __alloc_pages_may_oom():

	if (gfp_mask & __GFP_NOFAIL)
		page = __alloc_pages_cpuset_fallback(gfp_mask, order,
				ALLOC_NO_WATERMARKS, ac);

Thanks,
Feng

> -- 
> Michal Hocko
> SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ