linux-kernel - Re: [v2 RFC PATCH 0/9] Another Approach to Use PMEM as NUMA Node

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190417091748.GF655@dhcp22.suse.cz>
Date:   Wed, 17 Apr 2019 11:17:48 +0200
From:   Michal Hocko <mhocko@...nel.org>
To:     Yang Shi <yang.shi@...ux.alibaba.com>
Cc:     mgorman@...hsingularity.net, riel@...riel.com, hannes@...xchg.org,
        akpm@...ux-foundation.org, dave.hansen@...el.com,
        keith.busch@...el.com, dan.j.williams@...el.com,
        fengguang.wu@...el.com, fan.du@...el.com, ying.huang@...el.com,
        ziy@...dia.com, linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [v2 RFC PATCH 0/9] Another Approach to Use PMEM as NUMA Node

On Tue 16-04-19 12:19:21, Yang Shi wrote:
> 
> 
> On 4/16/19 12:47 AM, Michal Hocko wrote:
[...]
> > Why cannot we simply demote in the proximity order? Why do you make
> > cpuless nodes so special? If other close nodes are vacant then just use
> > them.
> 
> We could. But, this raises another question, would we prefer to just demote
> to the next fallback node (just try once), if it is contended, then just
> swap (i.e. DRAM0 -> PMEM0 -> Swap); or would we prefer to try all the nodes
> in the fallback order to find the first less contended one (i.e. DRAM0 ->
> PMEM0 -> DRAM1 -> PMEM1 -> Swap)?

I would go with the later. Why, because it is more natural. Because that
is the natural allocation path so I do not see why this shouldn't be the
natural demotion path.

> 
> |------|     |------| |------|        |------|
> |PMEM0|---|DRAM0| --- CPU0 --- CPU1 --- |DRAM1| --- |PMEM1|
> |------|     |------| |------|       |------|
> 
> The first one sounds simpler, and the current implementation does so and
> this needs find out the closest PMEM node by recognizing cpuless node.

Unless you are specifying an explicit nodemask then the allocator will
do the allocation fallback for the migration target for you.

> If we prefer go with the second option, it is definitely unnecessary to
> specialize any node.
> 
> > > > I would expect that the very first attempt wouldn't do much more than
> > > > migrate to-be-reclaimed pages (without an explicit binding) with a
> > > Do you mean respect mempolicy or cpuset when doing demotion? I was wondering
> > > this, but I didn't do so in the current implementation since it may need
> > > walk the rmap to retrieve the mempolicy in the reclaim path. Is there any
> > > easier way to do so?
> > You definitely have to follow policy. You cannot demote to a node which
> > is outside of the cpuset/mempolicy because you are breaking contract
> > expected by the userspace. That implies doing a rmap walk.
> 
> OK, however, this may prevent from demoting unmapped page cache since there
> is no way to find those pages' policy.

I do not really expect that hard numa binding for the page cache is a
usecase we really have to lose sleep over for now.

> And, we have to think about what we should do when the demotion target has
> conflict with the mempolicy.

Simply skip it.

> The easiest way is to just skip those conflict
> pages in demotion. Or we may have to do the demotion one page by one page
> instead of migrating a list of pages.

Yes one page at the time sounds reasonable to me. THis is how we do
reclaim anyway.
-- 
Michal Hocko
SUSE Labs