lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 12 Oct 2016 13:01:58 +0200
From:   Michal Hocko <mhocko@...nel.org>
To:     Anshuman Khandual <khandual@...ux.vnet.ibm.com>
Cc:     Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Linux Memory Management List <linux-mm@...ck.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Mel Gorman <mgorman@...e.de>,
        "Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>,
        Balbir Singh <bsingharora@...il.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Minchan Kim <minchan@...nel.org>
Subject: Re: MPOL_BIND on memory only nodes

On Wed 12-10-16 16:08:48, Anshuman Khandual wrote:
> On 10/12/2016 03:13 PM, Michal Hocko wrote:
> > On Wed 12-10-16 14:55:24, Anshuman Khandual wrote:
> >> Hi,
> >>
> >> We have the following function policy_zonelist() which selects a zonelist
> >> during various allocation paths. With this, general user space allocations
> >> (IIUC might not have __GFP_THISNODE) fails while trying to get memory from
> >> a memory only node without CPUs as the application runs some where else
> >> and that node is not part of the nodemask.
> 
> My bad. Was playing with some changes to the zonelists rebuild after
> a memory node hotplug and the order of various zones in them.
> 
> > 
> > I am not sure I understand. So you have a task with MPOL_BIND without a
> > cpu less node in the mask and you are wondering why the memory is not
> > allocated from that node?
> 
> In my experiment, there is a MPOL_BIND call with a CPU less node in
> the node mask and the memory is not allocated from that CPU less node.
> Thats because the zone of the CPU less node was absent from the
> FALLBACK zonelist of the local node.

So do I understand this correctly that the issue was caused by
non-upstream changes?

> >> Why we insist on __GFP_THISNODE ?
> > 
> > AFAIU __GFP_THISNODE just overrides the given node to the policy
> > nodemask in case the current node is not part of that node mask. In
> > other words we are ignoring the given node and use what the policy says. 
> 
> Right but provided the gfp flag has __GFP_THISNODE in it. In absence
> of __GFP_THISNODE, the node from the nodemask will not be selected.

In absence of __GFP_THISNODE we will use the zonelist for the given node
and that should contain even memoryless nodes for the fallback. The
nodemask from policy_nodemask() will then make sure that only nodes
relevant to the used policy is used.

> I still wonder why ? Can we always go to the first node in the
> nodemask for MPOL_BIND interface calls ? Just curious to know why
> preference is given to the local node and it's FALLBACK zonelist.

It is not always a local node. Look at how do_huge_pmd_wp_page_fallback
tries to make all the pages into the same node. Also we have
alloc_pages_current() which tries to allocate from the local node which
should not fallback to the firs node in the policy nodemask.

-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ