lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 17 Nov 2017 13:19:56 -0500
From:   Pavel Tatashin <pasha.tatashin@...cle.com>
To:     Mel Gorman <mgorman@...hsingularity.net>
Cc:     Michal Hocko <mhocko@...nel.org>,
        YASUAKI ISHIMATSU <yasu.isimatu@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linux Memory Management List <linux-mm@...ck.org>,
        linux-kernel@...r.kernel.org, koki.sanagi@...fujitsu.com,
        Steve Sistare <steven.sistare@...cle.com>
Subject: Re: [PATCH] mm, meminit: Serially initialise deferred memory if
 trace_buf_size is specified

On Thu, Nov 16, 2017 at 5:06 AM, Mel Gorman <mgorman@...hsingularity.net> wrote:
> 4. Put a check into the page allocator slowpath that triggers serialised
>    init if the system is booting and an allocation is about to fail. It
>    would be such a cold path that it would never be noticable although it
>    would leave dead code in the kernel image once boot had completed

Hi Mel,

The forth approach is the best as it is seamless for admins and
engineers, it will also work on any system configuration with any
parameters without any special involvement.

This approach will also address the following problem:
reset_deferred_meminit() has some assumptions about how much memory we
will need beforehand may break periodically as kernel requirements
change. For, instance, I recently reduced amount of memory system hash
tables take on large machines [1], so the comment in that function is
already outdated:
        /*
         * Initialise at least 2G of a node but also take into account that
         * two large system hashes that can take up 1GB for 0.25TB/node.
         */

With this approach we could always init a very small amount of struct
pages, and allow the rest to be initialized on demand as boot requires
until deferred struct pages are initialized. Since, having deferred
pages feature assumes that the machine is large, there is no drawback
of having some extra byte of dead code, especially that all the checks
can be permanently switched of via static branches once deferred init
is complete.

The second benefit that this approach may bring is the following: it
may enable to add a new feature which would initialize struct pages on
demand later, when needed by applications. This feature would be
configurable or enabled via kernel parameter (not sure which is
better).

if (allocation is failing)
  if (uninit struct pages available)
    init enought to finish alloc

Again, once all pages are initialized, the checks will be turned off
via static branching, so I think the code can be shared.

Here is the rationale for this feature:

Each physical machine may run a very large number of linux containers.
Steve Sistare (CCed), recently studied how much memory each instance
of clear container is taking, and it turns out to be about 125 MB,
when containers are booted with 2G of memory and 1 CPU. Out of those
125 MB, 32 MB is consumed by struct page array as we use 64-bytes per
page. Admins tend to be protective in the amount of memory that is
configured, therefore they may over-commit the amount of memory that
is actually required by the container. So, by allowing struct pages to
be initialized only on demand, we can save around 25% of the memory
that is consumed by fresh instance of container. Now, that struct
pages are not zeroed during boot [2], and if we will implement the
forth option, we can get closer to implementing a complete on demand
struct page initialization.

I can volunteer to work on these projects.

[1] https://patchwork.kernel.org/patch/9599545/
[2] https://lwn.net/Articles/734374

Thank you,
Pavel

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ