[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <61054afa-9f18-45f1-987d-e6f242012096@linux.ibm.com>
Date: Mon, 25 Mar 2024 10:32:18 +0530
From: Donet Tom <donettom@...ux.ibm.com>
To: "Huang, Ying" <ying.huang@...el.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, Aneesh Kumar <aneesh.kumar@...nel.org>,
Michal Hocko <mhocko@...nel.org>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Mel Gorman <mgorman@...e.de>, Feng Tang <feng.tang@...el.com>,
Andrea Arcangeli <aarcange@...hat.com>,
Peter Zijlstra
<peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
Rik van Riel <riel@...riel.com>, Johannes Weiner <hannes@...xchg.org>,
Matthew Wilcox <willy@...radead.org>, Vlastimil Babka <vbabka@...e.cz>,
Dan Williams <dan.j.williams@...el.com>,
Hugh Dickins <hughd@...gle.com>,
Kefeng Wang <wangkefeng.wang@...wei.com>,
Suren Baghdasaryan <surenb@...gle.com>
Subject: Re: [PATCH v3 2/2] mm/numa_balancing:Allow migrate on protnone
reference with MPOL_PREFERRED_MANY policy
On 3/25/24 08:18, Huang, Ying wrote:
> Donet Tom <donettom@...ux.ibm.com> writes:
>
>> On 3/22/24 14:02, Huang, Ying wrote:
>>> Donet Tom <donettom@...ux.ibm.com> writes:
>>>
>>>> commit bda420b98505 ("numa balancing: migrate on fault among multiple bound
>>>> nodes") added support for migrate on protnone reference with MPOL_BIND
>>>> memory policy. This allowed numa fault migration when the executing node
>>>> is part of the policy mask for MPOL_BIND. This patch extends migration
>>>> support to MPOL_PREFERRED_MANY policy.
>>>>
>>>> Currently, we cannot specify MPOL_PREFERRED_MANY with the mempolicy flag
>>>> MPOL_F_NUMA_BALANCING. This causes issues when we want to use
>>>> NUMA_BALANCING_MEMORY_TIERING. To effectively use the slow memory tier,
>>>> the kernel should not allocate pages from the slower memory tier via
>>>> allocation control zonelist fallback. Instead, we should move cold pages
>>>> from the faster memory node via memory demotion. For a page allocation,
>>>> kswapd is only woken up after we try to allocate pages from all nodes in
>>>> the allocation zone list. This implies that, without using memory
>>>> policies, we will end up allocating hot pages in the slower memory tier.
>>>>
>>>> MPOL_PREFERRED_MANY was added by commit b27abaccf8e8 ("mm/mempolicy: add
>>>> MPOL_PREFERRED_MANY for multiple preferred nodes") to allow better
>>>> allocation control when we have memory tiers in the system. With
>>>> MPOL_PREFERRED_MANY, the user can use a policy node mask consisting only
>>>> of faster memory nodes. When we fail to allocate pages from the faster
>>>> memory node, kswapd would be woken up, allowing demotion of cold pages
>>>> to slower memory nodes.
>>>>
>>>> With the current kernel, such usage of memory policies implies we can't
>>>> do page promotion from a slower memory tier to a faster memory tier
>>>> using numa fault. This patch fixes this issue.
>>>>
>>>> For MPOL_PREFERRED_MANY, if the executing node is in the policy node
>>>> mask, we allow numa migration to the executing nodes. If the executing
>>>> node is not in the policy node mask, we do not allow numa migration.
>>> Can we provide more information about this? I suggest to use an
>>> example, for instance, pages may be distributed among multiple sockets
>>> unexpectedly.
>> Thank you for your suggestion. However, this commit message explains all the scenarios.
> Yes. The commit message is correct and covers many cases. What I
> suggested is to describe why we do that? An examples can not covers all
> possibility, but it is easy to be understood. For example, something as
> below?
>
> For example, on a 2-sockets system, there are N0, N1, N2 in socket 0, N3
> in socket 1. N0, N1, N3 have fast memory and CPU, while N2 has slow
> memory and no CPU. For a workload, we may use MPOL_PREFERRED_MANY with
> nodemask with N0 and N1 set because the workload runs on CPUs of socket
> 0 at most times. Then, even if the workload runs on CPUs of N3
> occasionally, we will not try to migrate the workload pages from N2 to
> N3 because users may want to avoid cross-socket access as much as
> possible in the long term.
>
>> For example, Consider a system with 3 numa nodes (N0,N1 and N6).
>> N0 and N1 are tier1 DRAM nodes and N6 is tier 2 PMEM node.
>>
>> Scenario 1: The process is executing on N1,
>> If the executing node is in the policy node mask,
>> Curr Loc Pages - The numa node where page present(folio node)
>> ==================================================================================
>> Process Policy Curr Loc Pages Observations
>> -----------------------------------------------------------------------------------
>> N1 N0 N1 N6 N0 Pages Migrated from N0 to N1
>> N1 N0 N1 N6 N6 Pages Migrated from N6 to N1
>> N1 N0 N1 N1 Pages Migrated from N1 to N6
> Pages are not Migrating ?
Sorry .This is a mistake. In this case Pages are not migrating.
Thanks
Donet.
>
>> N1 N0 N1 N6 Pages Migrated from N6 to N1
>> ------------------------------------------------------------------------------------
>> Scenario 2: The process is executing on N1,
>> If the executing node is NOT in the policy node mask,
>> Curr Loc Pages - The numa node where page present(folio node)
>> ===================================================================================
>> Process Policy Curr Loc Pages Observations
>> -----------------------------------------------------------------------------------
>> N1 N0 N6 N0 Pages are not Migrating
>> N1 N0 N6 N6 Pages are not migration,
>> N1 N0 N0 Pages are not Migrating
>> ------------------------------------------------------------------------------------
>>
>> Scenario 3: The process is executing on N1,
>> If the executing node and folio nodes are NOT in the policy node mask,
>> Curr Loc Pages - The numa node where page present (folio node)
>> ====================================================================================
>> Thread Policy Curr Loc Pages Observations
>> ------------------------------------------------------------------------------------
>> N1 N0 N6 Pages are not Migrating
>> N1 N6 N0 Pages are not Migrating
>> ------------------------------------------------------------------------------------
>>
>> We can conclude that even if the pages are distributed among multiple sockets,
>> if the executing node is in the policy node mask, we allow numa migration to the
>> executing nodes. If the executing node is not in the policy node mask,
>> we do not allow numa migration.
>>
> [snip]
>
> --
> Best Regards,
> Huang, Ying
Powered by blists - more mailing lists