lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fb4a6789-d461-e83e-c718-baff88026a88@intel.com>
Date:   Thu, 15 Jul 2021 11:49:01 -0700
From:   Dave Hansen <dave.hansen@...el.com>
To:     Andrew Morton <akpm@...ux-foundation.org>,
        Feng Tang <feng.tang@...el.com>
Cc:     linux-mm@...ck.org, Michal Hocko <mhocko@...nel.org>,
        David Rientjes <rientjes@...gle.com>,
        Ben Widawsky <ben.widawsky@...el.com>,
        linux-kernel@...r.kernel.org, linux-api@...r.kernel.org,
        Andrea Arcangeli <aarcange@...hat.com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Randy Dunlap <rdunlap@...radead.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        Andi Kleen <ak@...ux.intel.com>,
        Dan Williams <dan.j.williams@...el.com>, ying.huang@...el.com
Subject: Re: [PATCH v6 0/6] Introduce multi-preference mempolicy

On 7/14/21 5:15 PM, Andrew Morton wrote:
> On Mon, 12 Jul 2021 16:09:28 +0800 Feng Tang <feng.tang@...el.com> wrote:
>> This patch series introduces the concept of the MPOL_PREFERRED_MANY mempolicy.
>> This mempolicy mode can be used with either the set_mempolicy(2) or mbind(2)
>> interfaces. Like the MPOL_PREFERRED interface, it allows an application to set a
>> preference for nodes which will fulfil memory allocation requests. Unlike the
>> MPOL_PREFERRED mode, it takes a set of nodes. Like the MPOL_BIND interface, it
>> works over a set of nodes. Unlike MPOL_BIND, it will not cause a SIGSEGV or
>> invoke the OOM killer if those preferred nodes are not available.
> Do we have any real-world testing which demonstrates the benefits of
> all of this?

Yes, it's actually been quite useful in practice already.

If we take persistent memory media (PMEM) and hot-add/online it with the
DAX kmem driver, we get NUMA nodes with lots of capacity (~6TB is
typical) but weird performance; PMEM has good read speed, but low write
speed.

That low write speed is *so* low that it dominates the performance more
than the distance from the CPUs.  Folks who want PMEM really don't care
about locality.  The discussions with the testers usually go something
like this:

Tester: How do I make my test use PMEM on nodes 2 and 3?
Kernel Guys: use 'numactl --membind=2-3'
Tester: I tried that, but I'm getting allocation failures once I fill up
        PMEM.  Shouldn't it fall back to DRAM?
Kernel Guys: Fine, use 'numactl --preferred=2-3'
Tester: That worked, but it started using DRAM after it exhausted node 2
Kernel Guys:  Dang it.  I forgot --preferred ignores everything after
              the first node.  Fine, we'll patch the kernel.

This has happened more than once.  End users want to be able to specify
a specific physical media, but don't want to have to deal with the sharp
edges of strict binding.

This has happened both with slow media like PMEM and "faster" media like
High-Bandwidth Memory.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ