[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190327203520.GU10344@bombadil.infradead.org>
Date: Wed, 27 Mar 2019 13:35:20 -0700
From: Matthew Wilcox <willy@...radead.org>
To: Dan Williams <dan.j.williams@...el.com>
Cc: Michal Hocko <mhocko@...nel.org>,
Yang Shi <yang.shi@...ux.alibaba.com>,
Mel Gorman <mgorman@...hsingularity.net>,
Rik van Riel <riel@...riel.com>,
Johannes Weiner <hannes@...xchg.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Dave Hansen <dave.hansen@...el.com>,
Keith Busch <keith.busch@...el.com>,
Fengguang Wu <fengguang.wu@...el.com>,
"Du, Fan" <fan.du@...el.com>, "Huang, Ying" <ying.huang@...el.com>,
Linux MM <linux-mm@...ck.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH 0/10] Another Approach to Use PMEM as NUMA Node
On Wed, Mar 27, 2019 at 10:34:11AM -0700, Dan Williams wrote:
> On Wed, Mar 27, 2019 at 2:01 AM Michal Hocko <mhocko@...nel.org> wrote:
> > No, Linux NUMA implementation makes all numa nodes available by default
> > and provides an API to opt-in for more fine tuning. What you are
> > suggesting goes against that semantic and I am asking why. How is pmem
> > NUMA node any different from any any other distant node in principle?
>
> Agree. It's just another NUMA node and shouldn't be special cased.
> Userspace policy can choose to avoid it, but typical node distance
> preference should otherwise let the kernel fall back to it as
> additional memory pressure relief for "near" memory.
I think this is sort of true, but sort of different. These are
essentially CPU-less nodes; there is no CPU for which they are
fast memory. Yes, they're further from some CPUs than from others.
I have never paid attention to how Linux treats CPU-less memory nodes,
but it would make sense to me if we don't default to allocating from
remote nodes. And treating pmem nodes as being remote from all CPUs
makes a certain amount of sense to me.
eg on a four CPU-socket system, consider this as being
pmem1 --- node1 --- node2 --- pmem2
| \ / |
| X |
| / \ |
pmem3 --- node3 --- node4 --- pmem4
which I could actually see someone building with normal DRAM, and we
should probably handle the same way as pmem; for a process running on
node3, allocate preferentially from node3, then pmem3, then other nodes,
then other pmems.
Powered by blists - more mailing lists