linux-kernel - Re: [PATCH 01/10] mm: control memory placement by nodemask for two tier main memory

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAPcyv4g5RoHhXhkKQaYkqYLN1y3KavbGeM1zVus-3fY5Q+JdxA@mail.gmail.com>
Date:   Sat, 23 Mar 2019 10:21:30 -0700
From:   Dan Williams <dan.j.williams@...el.com>
To:     Yang Shi <yang.shi@...ux.alibaba.com>
Cc:     Michal Hocko <mhocko@...e.com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Rik van Riel <riel@...riel.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Dave Hansen <dave.hansen@...el.com>,
        Keith Busch <keith.busch@...el.com>,
        Fengguang Wu <fengguang.wu@...el.com>,
        "Du, Fan" <fan.du@...el.com>, "Huang, Ying" <ying.huang@...el.com>,
        Linux MM <linux-mm@...ck.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 01/10] mm: control memory placement by nodemask for two
 tier main memory

On Fri, Mar 22, 2019 at 9:45 PM Yang Shi <yang.shi@...ux.alibaba.com> wrote:
>
> When running applications on the machine with NVDIMM as NUMA node, the
> memory allocation may end up on NVDIMM node.  This may result in silent
> performance degradation and regression due to the difference of hardware
> property.
>
> DRAM first should be obeyed to prevent from surprising regression.  Any
> non-DRAM nodes should be excluded from default allocation.  Use nodemask
> to control the memory placement.  Introduce def_alloc_nodemask which has
> DRAM nodes set only.  Any non-DRAM allocation should be specified by
> NUMA policy explicitly.
>
> In the future we may be able to extract the memory charasteristics from
> HMAT or other source to build up the default allocation nodemask.
> However, just distinguish DRAM and PMEM (non-DRAM) nodes by SRAT flag
> for the time being.
>
> Signed-off-by: Yang Shi <yang.shi@...ux.alibaba.com>
> ---
>  arch/x86/mm/numa.c     |  1 +
>  drivers/acpi/numa.c    |  8 ++++++++
>  include/linux/mmzone.h |  3 +++
>  mm/page_alloc.c        | 18 ++++++++++++++++--
>  4 files changed, 28 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
> index dfb6c4d..d9e0ca4 100644
> --- a/arch/x86/mm/numa.c
> +++ b/arch/x86/mm/numa.c
> @@ -626,6 +626,7 @@ static int __init numa_init(int (*init_func)(void))
>         nodes_clear(numa_nodes_parsed);
>         nodes_clear(node_possible_map);
>         nodes_clear(node_online_map);
> +       nodes_clear(def_alloc_nodemask);
>         memset(&numa_meminfo, 0, sizeof(numa_meminfo));
>         WARN_ON(memblock_set_node(0, ULLONG_MAX, &memblock.memory,
>                                   MAX_NUMNODES));
> diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa.c
> index 867f6e3..79dfedf 100644
> --- a/drivers/acpi/numa.c
> +++ b/drivers/acpi/numa.c
> @@ -296,6 +296,14 @@ void __init acpi_numa_slit_init(struct acpi_table_slit *slit)
>                 goto out_err_bad_srat;
>         }
>
> +       /*
> +        * Non volatile memory is excluded from zonelist by default.
> +        * Only regular DRAM nodes are set in default allocation node
> +        * mask.
> +        */
> +       if (!(ma->flags & ACPI_SRAT_MEM_NON_VOLATILE))
> +               node_set(node, def_alloc_nodemask);

Hmm, no, I don't think we should do this. Especially considering
current generation NVDIMMs are energy backed DRAM there is no
performance difference that should be assumed by the non-volatile
flag.

Why isn't default SLIT distance sufficient for ensuring a DRAM-first
default policy?