lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171116100633.moui6zu33ctzpjsf@techsingularity.net>
Date:   Thu, 16 Nov 2017 10:06:33 +0000
From:   Mel Gorman <mgorman@...hsingularity.net>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     YASUAKI ISHIMATSU <yasu.isimatu@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, koki.sanagi@...fujitsu.com
Subject: Re: [PATCH] mm, meminit: Serially initialise deferred memory if
 trace_buf_size is specified

On Thu, Nov 16, 2017 at 09:54:33AM +0100, Michal Hocko wrote:
> On Wed 15-11-17 14:17:52, YASUAKI ISHIMATSU wrote:
> > Hi Michal and Mel,
> > 
> > To reproduce the issue, I specified the large trace buffer. The issue also occurs with
> > trace_buf_size=12M and movable_node on 4.14.0.
> 
> This is still 10x more than the default. Why do you need it in the first
> place? You can of course find a size that will not fit into the initial
> memory but I am questioning why do you want something like that during
> early boot in the first place.
> 

This confused me as well. I couldn't think of a sensible use-case for
increasing the buffer other than stuffing trace_printk in multiple places
for debugging purposes. Even in such cases, it would be feasible to disable
the option in Kconfig just to have the large buffer.  Otherwise, just wait
until the system is booted and set if from userspace.

The lack of a sensible use-case is why I took a fairly blunt approach to
the problem. Keep it (relatively) simple and all that.

> The whole deferred struct page allocation operates under assumption
> that there are no large page allocator consumers that early during
> the boot process.

Yes.

> If this assumption is not correct then we probably
> need a generic way to describe this. Add-hoc trace specific thing is
> far from idea, imho. If anything the first approach to disable the
> deferred initialization via kernel command line option sounds much more
> appropriate and simpler to me.

So while the first approach was blunt, there are multiple other options.

1. Parse trace_buf_size in __setup to record the information before
   page alloc init starts. Take that into account in reset_deferred_meminit
   to increase the amount of memory that is serially initialised

2. Have tracing init with a small buffer and then resize it after
   page_alloc_init_late. This modifies tracing a bit but most of the
   helpers that are required are there. It would be more complex but
   it's doable

3. Add a kernel command line parameter that explicitly disables deferred
   meminit. We used to have something like this but it was never merged
   as we should be able to estimate how much memory is needed to boot.

4. Put a check into the page allocator slowpath that triggers serialised
   init if the system is booting and an allocation is about to fail. It
   would be such a cold path that it would never be noticable although it
   would leave dead code in the kernel image once boot had completed

However, it would be preferable by far to have a sensible use-case as to
why trace_buf_size= would be specified with a large buffer. It's hard to
know what level of complexity is justified without it.

-- 
Mel Gorman
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ