linux-kernel - Re: [PATCH v5 0/5] mm: Randomize free memory

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2153922.MoOcIFpNeT@aspire.rjw.lan>
Date:   Mon, 17 Dec 2018 11:10:57 +0100
From:   "Rafael J. Wysocki" <rjw@...ysocki.net>
To:     Dan Williams <dan.j.williams@...el.com>
Cc:     akpm@...ux-foundation.org,
        "Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
        Keith Busch <keith.busch@...el.com>,
        Mike Rapoport <rppt@...ux.ibm.com>,
        Kees Cook <keescook@...omium.org>, x86@...nel.org,
        Michal Hocko <mhocko@...e.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Andy Lutomirski <luto@...nel.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 0/5] mm: Randomize free memory

On Saturday, December 15, 2018 2:48:30 AM CET Dan Williams wrote:
> Changes since v4: [1]
> * Default the randomization to off and enable it dynamically based on
>   the detection of a memory side cache advertised by platform firmware.
>   In the case of x86 this enumeration comes from the ACPI HMAT. (Michal
>   and Mel)
> * Improve the changelog of the patch that introduces the shuffling to
>   clarify the motivation and better explain the tradeoffs. (Michal and
>   Mel)
> * Include the required HMAT enabling in the series.
> 
> [1]: https://lkml.kernel.org/r/153922180166.838512.8260339805733812034.stgit@dwillia2-desk3.amr.corp.intel.com
> 
> ---
> 
> Quote patch 3:
> 
> Randomization of the page allocator improves the average utilization of
> a direct-mapped memory-side-cache. Memory side caching is a platform
> capability that Linux has been previously exposed to in HPC
> (high-performance computing) environments on specialty platforms. In
> that instance it was a smaller pool of high-bandwidth-memory relative to
> higher-capacity / lower-bandwidth DRAM. Now, this capability is going to
> be found on general purpose server platforms where DRAM is a cache in
> front of higher latency persistent memory [2].
> 
> Robert offered an explanation of the state of the art of Linux
> interactions with memory-side-caches [3], and I copy it here:
> 
>     It's been a problem in the HPC space:
>     http://www.nersc.gov/research-and-development/knl-cache-mode-performance-coe/
> 
>     A kernel module called zonesort is available to try to help:
>     https://software.intel.com/en-us/articles/xeon-phi-software
> 
>     and this abandoned patch series proposed that for the kernel:
>     https://lkml.org/lkml/2017/8/23/195
> 
>     Dan's patch series doesn't attempt to ensure buffers won't conflict, but
>     also reduces the chance that the buffers will. This will make performance
>     more consistent, albeit slower than "optimal" (which is near impossible
>     to attain in a general-purpose kernel).  That's better than forcing
>     users to deploy remedies like:
>         "To eliminate this gradual degradation, we have added a Stream
>          measurement to the Node Health Check that follows each job;
>          nodes are rebooted whenever their measured memory bandwidth
>          falls below 300 GB/s."
> 
> A replacement for zonesort was merged upstream in commit cc9aec03e58f
> "x86/numa_emulation: Introduce uniform split capability". With this
> numa_emulation capability, memory can be split into cache sized
> ("near-memory" sized) numa nodes. A bind operation to such a node, and
> disabling workloads on other nodes, enables full cache performance.
> However, once the workload exceeds the cache size then cache conflicts
> are unavoidable. While HPC environments might be able to tolerate
> time-scheduling of cache sized workloads, for general purpose server
> platforms, the oversubscribed cache case will be the common case.
> 
> The worst case scenario is that a server system owner benchmarks a
> workload at boot with an un-contended cache only to see that performance
> degrade over time, even below the average cache performance due to
> excessive conflicts. Randomization clips the peaks and fills in the
> valleys of cache utilization to yield steady average performance.
> 
> See patch 3 for more details.
> 
> [2]: https://itpeernetwork.intel.com/intel-optane-dc-persistent-memory-operating-modes/
> [3]: https://lkml.org/lkml/2018/9/22/54

Has this hibernation been tested with this series applied?