[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20161012110158.GK17128@dhcp22.suse.cz>
Date: Wed, 12 Oct 2016 13:01:58 +0200
From: Michal Hocko <mhocko@...nel.org>
To: Anshuman Khandual <khandual@...ux.vnet.ibm.com>
Cc: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Linux Memory Management List <linux-mm@...ck.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Mel Gorman <mgorman@...e.de>,
"Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>,
Balbir Singh <bsingharora@...il.com>,
Vlastimil Babka <vbabka@...e.cz>,
Minchan Kim <minchan@...nel.org>
Subject: Re: MPOL_BIND on memory only nodes
On Wed 12-10-16 16:08:48, Anshuman Khandual wrote:
> On 10/12/2016 03:13 PM, Michal Hocko wrote:
> > On Wed 12-10-16 14:55:24, Anshuman Khandual wrote:
> >> Hi,
> >>
> >> We have the following function policy_zonelist() which selects a zonelist
> >> during various allocation paths. With this, general user space allocations
> >> (IIUC might not have __GFP_THISNODE) fails while trying to get memory from
> >> a memory only node without CPUs as the application runs some where else
> >> and that node is not part of the nodemask.
>
> My bad. Was playing with some changes to the zonelists rebuild after
> a memory node hotplug and the order of various zones in them.
>
> >
> > I am not sure I understand. So you have a task with MPOL_BIND without a
> > cpu less node in the mask and you are wondering why the memory is not
> > allocated from that node?
>
> In my experiment, there is a MPOL_BIND call with a CPU less node in
> the node mask and the memory is not allocated from that CPU less node.
> Thats because the zone of the CPU less node was absent from the
> FALLBACK zonelist of the local node.
So do I understand this correctly that the issue was caused by
non-upstream changes?
> >> Why we insist on __GFP_THISNODE ?
> >
> > AFAIU __GFP_THISNODE just overrides the given node to the policy
> > nodemask in case the current node is not part of that node mask. In
> > other words we are ignoring the given node and use what the policy says.
>
> Right but provided the gfp flag has __GFP_THISNODE in it. In absence
> of __GFP_THISNODE, the node from the nodemask will not be selected.
In absence of __GFP_THISNODE we will use the zonelist for the given node
and that should contain even memoryless nodes for the fallback. The
nodemask from policy_nodemask() will then make sure that only nodes
relevant to the used policy is used.
> I still wonder why ? Can we always go to the first node in the
> nodemask for MPOL_BIND interface calls ? Just curious to know why
> preference is given to the local node and it's FALLBACK zonelist.
It is not always a local node. Look at how do_huge_pmd_wp_page_fallback
tries to make all the pages into the same node. Also we have
alloc_pages_current() which tries to allocate from the local node which
should not fallback to the firs node in the policy nodemask.
--
Michal Hocko
SUSE Labs
Powered by blists - more mailing lists