[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190328204020.GD7155@dhcp22.suse.cz>
Date: Thu, 28 Mar 2019 21:40:20 +0100
From: Michal Hocko <mhocko@...nel.org>
To: Yang Shi <yang.shi@...ux.alibaba.com>
Cc: Dan Williams <dan.j.williams@...el.com>,
Mel Gorman <mgorman@...hsingularity.net>,
Rik van Riel <riel@...riel.com>,
Johannes Weiner <hannes@...xchg.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Dave Hansen <dave.hansen@...el.com>,
Keith Busch <keith.busch@...el.com>,
Fengguang Wu <fengguang.wu@...el.com>,
"Du, Fan" <fan.du@...el.com>, "Huang, Ying" <ying.huang@...el.com>,
Linux MM <linux-mm@...ck.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH 0/10] Another Approach to Use PMEM as NUMA Node
On Thu 28-03-19 12:40:14, Yang Shi wrote:
>
>
> On 3/28/19 12:12 PM, Michal Hocko wrote:
> > On Thu 28-03-19 11:58:57, Yang Shi wrote:
> > >
> > > On 3/27/19 11:58 PM, Michal Hocko wrote:
> > > > On Wed 27-03-19 19:09:10, Yang Shi wrote:
> > > > > One question, when doing demote and promote we need define a path, for
> > > > > example, DRAM <-> PMEM (assume two tier memory). When determining what nodes
> > > > > are "DRAM" nodes, does it make sense to assume the nodes with both cpu and
> > > > > memory are DRAM nodes since PMEM nodes are typically cpuless nodes?
> > > > Do we really have to special case this for PMEM? Why cannot we simply go
> > > > in the zonelist order? In other words why cannot we use the same logic
> > > > for a larger NUMA machine and instead of swapping simply fallback to a
> > > > less contended NUMA node? It can be a regular DRAM, PMEM or whatever
> > > > other type of memory node.
> > > Thanks for the suggestion. It makes sense. However, if we don't specialize a
> > > pmem node, its fallback node may be a DRAM node, then the memory reclaim may
> > > move the inactive page to the DRAM node, it sounds not make too much sense
> > > since memory reclaim would prefer to move downwards (DRAM -> PMEM -> Disk).
> > There are certainly many details to sort out. One thing is how to handle
> > cpuless nodes (e.g. PMEM). Those shouldn't get any direct allocations
> > without an explicit binding, right? My first naive idea would be to only
>
> Wait a minute. I thought we were arguing about the default allocation node
> mask yesterday. And, the conclusion is PMEM node should not be excluded from
> the node mask. PMEM nodes are cpuless nodes. I think I should replace all
> "PMEM node" to "cpuless node" in the cover letter and commit logs to make it
> explicitly.
No, this is not about the default allocation mask at all. Your
allocations start from a local/mempolicy node. CPUless nodes thus cannot be a
primary node so it will always be only in a fallback zonelist without an
explicit binding.
--
Michal Hocko
SUSE Labs
Powered by blists - more mailing lists