linux-kernel - Re: [RFC PATCH 0/10] Another Approach to Use PMEM as NUMA Node

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5934ed42-c512-a4c7-cbed-9062065bf276@linux.alibaba.com>
Date:   Thu, 28 Mar 2019 12:40:14 -0700
From:   Yang Shi <yang.shi@...ux.alibaba.com>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     Dan Williams <dan.j.williams@...el.com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Rik van Riel <riel@...riel.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Dave Hansen <dave.hansen@...el.com>,
        Keith Busch <keith.busch@...el.com>,
        Fengguang Wu <fengguang.wu@...el.com>,
        "Du, Fan" <fan.du@...el.com>, "Huang, Ying" <ying.huang@...el.com>,
        Linux MM <linux-mm@...ck.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH 0/10] Another Approach to Use PMEM as NUMA Node



On 3/28/19 12:12 PM, Michal Hocko wrote:
> On Thu 28-03-19 11:58:57, Yang Shi wrote:
>>
>> On 3/27/19 11:58 PM, Michal Hocko wrote:
>>> On Wed 27-03-19 19:09:10, Yang Shi wrote:
>>>> One question, when doing demote and promote we need define a path, for
>>>> example, DRAM <-> PMEM (assume two tier memory). When determining what nodes
>>>> are "DRAM" nodes, does it make sense to assume the nodes with both cpu and
>>>> memory are DRAM nodes since PMEM nodes are typically cpuless nodes?
>>> Do we really have to special case this for PMEM? Why cannot we simply go
>>> in the zonelist order? In other words why cannot we use the same logic
>>> for a larger NUMA machine and instead of swapping simply fallback to a
>>> less contended NUMA node? It can be a regular DRAM, PMEM or whatever
>>> other type of memory node.
>> Thanks for the suggestion. It makes sense. However, if we don't specialize a
>> pmem node, its fallback node may be a DRAM node, then the memory reclaim may
>> move the inactive page to the DRAM node, it sounds not make too much sense
>> since memory reclaim would prefer to move downwards (DRAM -> PMEM -> Disk).
> There are certainly many details to sort out. One thing is how to handle
> cpuless nodes (e.g. PMEM). Those shouldn't get any direct allocations
> without an explicit binding, right? My first naive idea would be to only

Wait a minute. I thought we were arguing about the default allocation 
node mask yesterday. And, the conclusion is PMEM node should not be 
excluded from the node mask. PMEM nodes are cpuless nodes. I think I 
should replace all "PMEM node" to "cpuless node" in the cover letter and 
commit logs to make it explicitly.

Quoted from Dan "For ACPI platforms the HMAT is effectively going to 
enforce "cpu-less" nodes for any memory range that has differentiated 
performance from the conventional memory pool, or differentiated 
performance for a specific initiator."

I apologize I didn't elaborate PMEM nodes are cpuless nodes at the first 
place. Of course, cpuless node may be not PMEM node.

To your question, yes, I do agree. Actually, this is what I mean about 
"DRAM only by default", or I should rephrase it to "exclude cpuless 
node", I thought they mean the same thing.

> migrate-on-reclaim only from the preferred node. We might need

If we exclude cpuless nodes, yes. The preferred node would be DRAM node 
only. Actually, the patchset does follow "migrate-on-reclaim only from 
the preferred node".

Thanks,
Yang

> additional heuristics but I wouldn't special case PMEM from other
> cpuless NUMA nodes.