linux-kernel - Re: [PATCH 0/7] padata: parallelize deferred page init

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200501024002.h7vpf4fxu6elmddn@ca-dmjordan1.us.oracle.com>
Date:   Thu, 30 Apr 2020 22:40:02 -0400
From:   Daniel Jordan <daniel.m.jordan@...cle.com>
To:     Pavel Tatashin <pasha.tatashin@...een.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Daniel Jordan <daniel.m.jordan@...cle.com>,
        Herbert Xu <herbert@...dor.apana.org.au>,
        Steffen Klassert <steffen.klassert@...unet.com>,
        Alex Williamson <alex.williamson@...hat.com>,
        Alexander Duyck <alexander.h.duyck@...ux.intel.com>,
        Dan Williams <dan.j.williams@...el.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        David Hildenbrand <david@...hat.com>,
        Jason Gunthorpe <jgg@...pe.ca>,
        Jonathan Corbet <corbet@....net>,
        Josh Triplett <josh@...htriplett.org>,
        Kirill Tkhai <ktkhai@...tuozzo.com>,
        Michal Hocko <mhocko@...nel.org>, Pavel Machek <pavel@....cz>,
        Peter Zijlstra <peterz@...radead.org>,
        Randy Dunlap <rdunlap@...radead.org>,
        Shile Zhang <shile.zhang@...ux.alibaba.com>,
        Tejun Heo <tj@...nel.org>, Zi Yan <ziy@...dia.com>,
        linux-crypto@...r.kernel.org, linux-mm <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 0/7] padata: parallelize deferred page init

On Thu, Apr 30, 2020 at 05:40:59PM -0400, Pavel Tatashin wrote:
> On Thu, Apr 30, 2020 at 5:31 PM Andrew Morton <akpm@...ux-foundation.org> wrote:
> > On Thu, 30 Apr 2020 16:11:18 -0400 Daniel Jordan <daniel.m.jordan@...cle.com> wrote:
> >
> > > Sometimes the kernel doesn't take full advantage of system memory
> > > bandwidth, leading to a single CPU spending excessive time in
> > > initialization paths where the data scales with memory size.
> > >
> > > Multithreading naturally addresses this problem, and this series is the
> > > first step.
> > >
> > > It extends padata, a framework that handles many parallel singlethreaded
> > > jobs, to handle multithreaded jobs as well by adding support for
> > > splitting up the work evenly, specifying a minimum amount of work that's
> > > appropriate for one helper thread to do, load balancing between helpers,
> > > and coordinating them.  More documentation in patches 4 and 7.
> > >
> > > The first user is deferred struct page init, a large bottleneck in
> > > kernel boot--actually the largest for us and likely others too.  This
> > > path doesn't require concurrency limits, resource control, or priority
> > > adjustments like future users will (vfio, hugetlb fallocate, munmap)
> > > because it happens during boot when the system is otherwise idle and
> > > waiting on page init to finish.
> > >
> > > This has been tested on a variety of x86 systems and speeds up kernel
> > > boot by 6% to 49% by making deferred init 63% to 91% faster.
> >
> > How long is this up-to-91% in seconds?  If it's 91% of a millisecond
> > then not impressed.  If it's 91% of two weeks then better :)

The largest system I could test had 384G per node and saved 1.5 out of 4
seconds.

> > Relatedly, how important is boot time on these large machines anyway?
> > They presumably have lengthy uptimes so boot time is relatively
> > unimportant?
> 
> Large machines indeed have a lengthy uptime, but they also can host a
> large number of VMs meaning that downtime of the host increases the
> downtime of VMs in cloud environments. Some VMs might be very sensible
> to downtime: game servers, traders, etc.
>
> > IOW, can you please explain more fully why this patchset is valuable to
> > our users?

I'll let the users speak for themselves, but I have a similar use case to Pavel
of limiting the downtime of VMs running on these large systems, and spinning up
instances as fast as possible is also desirable for our cloud users.