linux-kernel - Re: [PATCH 0/7] padata: parallelize deferred page init

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200501005026.GA104377@localhost>
Date:   Thu, 30 Apr 2020 17:50:26 -0700
From:   Josh Triplett <josh@...htriplett.org>
To:     Andrew Morton <akpm@...ux-foundation.org>
Cc:     Daniel Jordan <daniel.m.jordan@...cle.com>,
        Herbert Xu <herbert@...dor.apana.org.au>,
        Steffen Klassert <steffen.klassert@...unet.com>,
        Alex Williamson <alex.williamson@...hat.com>,
        Alexander Duyck <alexander.h.duyck@...ux.intel.com>,
        Dan Williams <dan.j.williams@...el.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        David Hildenbrand <david@...hat.com>,
        Jason Gunthorpe <jgg@...pe.ca>,
        Jonathan Corbet <corbet@....net>,
        Kirill Tkhai <ktkhai@...tuozzo.com>,
        Michal Hocko <mhocko@...nel.org>, Pavel Machek <pavel@....cz>,
        Pavel Tatashin <pasha.tatashin@...een.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Randy Dunlap <rdunlap@...radead.org>,
        Shile Zhang <shile.zhang@...ux.alibaba.com>,
        Tejun Heo <tj@...nel.org>, Zi Yan <ziy@...dia.com>,
        linux-crypto@...r.kernel.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/7] padata: parallelize deferred page init

On Thu, Apr 30, 2020 at 02:31:31PM -0700, Andrew Morton wrote:
> On Thu, 30 Apr 2020 16:11:18 -0400 Daniel Jordan <daniel.m.jordan@...cle.com> wrote:
> > Sometimes the kernel doesn't take full advantage of system memory
> > bandwidth, leading to a single CPU spending excessive time in
> > initialization paths where the data scales with memory size.
> > 
> > Multithreading naturally addresses this problem, and this series is the
> > first step.
> > 
> > It extends padata, a framework that handles many parallel singlethreaded
> > jobs, to handle multithreaded jobs as well by adding support for
> > splitting up the work evenly, specifying a minimum amount of work that's
> > appropriate for one helper thread to do, load balancing between helpers,
> > and coordinating them.  More documentation in patches 4 and 7.
> > 
> > The first user is deferred struct page init, a large bottleneck in
> > kernel boot--actually the largest for us and likely others too.  This
> > path doesn't require concurrency limits, resource control, or priority
> > adjustments like future users will (vfio, hugetlb fallocate, munmap)
> > because it happens during boot when the system is otherwise idle and
> > waiting on page init to finish.
> > 
> > This has been tested on a variety of x86 systems and speeds up kernel
> > boot by 6% to 49% by making deferred init 63% to 91% faster.
> 
> How long is this up-to-91% in seconds?  If it's 91% of a millisecond
> then not impressed.  If it's 91% of two weeks then better :)

Some test results on a system with 96 CPUs and 192GB of memory:

Without this patch series:
[    0.487132] node 0 initialised, 23398907 pages in 292ms
[    0.499132] node 1 initialised, 24189223 pages in 304ms
...
[    0.629376] Run /sbin/init as init process

With this patch series:
[    0.227868] node 0 initialised, 23398907 pages in 28ms
[    0.230019] node 1 initialised, 24189223 pages in 28ms
...
[    0.361069] Run /sbin/init as init process

That makes a huge difference; memory initialization is the largest
remaining component of boot time.

> Relatedly, how important is boot time on these large machines anyway? 
> They presumably have lengthy uptimes so boot time is relatively
> unimportant?

Cloud systems and other virtual machines may have a huge amount of
memory but not necessarily run for a long time; on such systems, boot
time becomes critically important. Reducing boot time on cloud systems
and VMs makes the difference between "leave running to reduce latency"
and "just boot up when needed".

- Josh Triplett