linux-kernel - Re: [PATCH 02/10] mm/numa: automatically generate node migration order

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210414080849.GA20886@linux>
Date:   Wed, 14 Apr 2021 10:08:54 +0200
From:   Oscar Salvador <osalvador@...e.de>
To:     Wei Xu <weixugc@...gle.com>
Cc:     Dave Hansen <dave.hansen@...ux.intel.com>,
        Linux MM <linux-mm@...ck.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Yang Shi <shy828301@...il.com>,
        David Rientjes <rientjes@...gle.com>,
        Huang Ying <ying.huang@...el.com>,
        Dan Williams <dan.j.williams@...el.com>,
        David Hildenbrand <david@...hat.com>
Subject: Re: [PATCH 02/10] mm/numa: automatically generate node migration
 order

On Fri, Apr 09, 2021 at 08:07:08PM -0700, Wei Xu wrote:
> On Thu, Apr 1, 2021 at 11:35 AM Dave Hansen <dave.hansen@...ux.intel.com> wrote:
> > + * When Node 0 fills up, its memory should be migrated to
> > + * Node 1.  When Node 1 fills up, it should be migrated to
> > + * Node 2.  The migration path start on the nodes with the
> > + * processors (since allocations default to this node) and
> > + * fast memory, progress through medium and end with the
> > + * slow memory:
> > + *
> > + *     0 -> 1 -> 2 -> stop
> > + *     3 -> 4 -> 5 -> stop
> > + *
> > + * This is represented in the node_demotion[] like this:
> > + *
> > + *     {  1, // Node 0 migrates to 1
> > + *        2, // Node 1 migrates to 2
> > + *       -1, // Node 2 does not migrate
> > + *        4, // Node 3 migrates to 4
> > + *        5, // Node 4 migrates to 5
> > + *       -1} // Node 5 does not migrate
> > + */
> 
> In this example, if we want to support multiple nodes as the demotion
> target of a source node, we can group these nodes into three tiers
> (classes):
> 
> fast class:
> 0 -> {1, 4}  // 1 is the preferred
> 3 -> {4, 1}  // 4 is the preferred
> 
> medium class:
> 1 -> {2, 5}  // 2 is the preferred
> 4 -> {5, 2}  // 5 is the preferred
> 
> slow class:
> 2 -> stop
> 5 -> stop

Hi Wei Xu,

I have some questions about it

Fast class/memory are pictured as those nodes with CPUs, while Slow class/memory
are PMEM, right?
Then, what stands for medium class/memory?

In Dave's example, list is created in a way that stays local to the socket,
and we go from the fast one to the slow one.
In yours, lists are created taking the fastest nodes from all sockets and
we work our way down, which means have cross-socket nodes in the list.
How much of a penalty is that?

And while I get your point, I am not sure if that is what we pretend here.
This patchset aims to place cold pages that are about to be reclaim in slower
nodes to give them a second chance, while your design seems more to have kind
of different memory clases and be able to place applications in one of those tiers
depending on its demands or sysadmin-demand.

Could you expand some more?

-- 
Oscar Salvador
SUSE L3