linux-kernel - Re: [RFC PATCH 0/10] Another Approach to Use PMEM as NUMA Node

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190327203520.GU10344@bombadil.infradead.org>
Date:   Wed, 27 Mar 2019 13:35:20 -0700
From:   Matthew Wilcox <willy@...radead.org>
To:     Dan Williams <dan.j.williams@...el.com>
Cc:     Michal Hocko <mhocko@...nel.org>,
        Yang Shi <yang.shi@...ux.alibaba.com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Rik van Riel <riel@...riel.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Dave Hansen <dave.hansen@...el.com>,
        Keith Busch <keith.busch@...el.com>,
        Fengguang Wu <fengguang.wu@...el.com>,
        "Du, Fan" <fan.du@...el.com>, "Huang, Ying" <ying.huang@...el.com>,
        Linux MM <linux-mm@...ck.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH 0/10] Another Approach to Use PMEM as NUMA Node

On Wed, Mar 27, 2019 at 10:34:11AM -0700, Dan Williams wrote:
> On Wed, Mar 27, 2019 at 2:01 AM Michal Hocko <mhocko@...nel.org> wrote:
> > No, Linux NUMA implementation makes all numa nodes available by default
> > and provides an API to opt-in for more fine tuning. What you are
> > suggesting goes against that semantic and I am asking why. How is pmem
> > NUMA node any different from any any other distant node in principle?
> 
> Agree. It's just another NUMA node and shouldn't be special cased.
> Userspace policy can choose to avoid it, but typical node distance
> preference should otherwise let the kernel fall back to it as
> additional memory pressure relief for "near" memory.

I think this is sort of true, but sort of different.  These are
essentially CPU-less nodes; there is no CPU for which they are
fast memory.  Yes, they're further from some CPUs than from others.
I have never paid attention to how Linux treats CPU-less memory nodes,
but it would make sense to me if we don't default to allocating from
remote nodes.  And treating pmem nodes as being remote from all CPUs
makes a certain amount of sense to me.

eg on a four CPU-socket system, consider this as being

pmem1 --- node1 --- node2 --- pmem2
            |   \ /   |
            |    X    |
            |   / \   |
pmem3 --- node3 --- node4 --- pmem4

which I could actually see someone building with normal DRAM, and we
should probably handle the same way as pmem; for a process running on
node3, allocate preferentially from node3, then pmem3, then other nodes,
then other pmems.