lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 14 Apr 2020 11:40:05 -0400
From:   Daniel Jordan <daniel.m.jordan@...cle.com>
To:     Alexander Duyck <alexander.duyck@...il.com>
Cc:     David Hildenbrand <david@...hat.com>,
        Alexander Duyck <alexander.h.duyck@...ux.intel.com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        linux-mm <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Dan Williams <dan.j.williams@...el.com>,
        Dave Hansen <dave.hansen@...el.com>,
        Michal Hocko <mhocko@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Alex Williamson <alex.williamson@...hat.com>
Subject: Re: [RFC PATCH 0/4] mm: Add PG_zero support

On Tue, Apr 14, 2020 at 08:07:32AM -0700, Alexander Duyck wrote:
> On Tue, Apr 14, 2020 at 5:01 AM David Hildenbrand <david@...hat.com> wrote:
> > Having that said, I agree with Dave here, that there might be better
> > alternatives for this somewhat-special-case.
> 
> I wonder if it wouldn't make more sense to look at the option of
> splitting the initialization work up over multiple CPUs instead of
> leaving it all single threaded. The data above was creating a VM with
> 64GB of RAM and 32 CPUs. How fast could we zero the pages if we were
> performing the zeroing over those 32 CPUs? I wonder if we couldn't
> look at recruiting other CPUs on the same node to perform the zeroing
> like what Dan had originally proposed for ZONE_DEVICE initialization a
> couple years ago[1].

This is exactly what I've done for VFIO.  Some performance results:

    https://lore.kernel.org/linux-mm/20181105165558.11698-10-daniel.m.jordan@oracle.com/

and a semi-current branch is here if anyone wants to test it:

  https://lore.kernel.org/linux-mm/20200212224731.kmss6o6agekkg3mw@ca-dmjordan1.us.oracle.com/

One of the issues with starting extra threads for paths triggered from
userspace such as VFIO is that they need to be properly throttled by relevant
resource controls such as cgroup (CPU controller especially) and
sched_setafffinity.  This type of control for kernel threads has another use
case too, async memcg reclaim.  All this is second on my list after I post a
series that multithreads deferred page init and sets up the basic
infrastructure for multithreading other paths, which I hope will be ready soon.

> [1]: https://lore.kernel.org/linux-mm/153077336359.40830.13007326947037437465.stgit@dwillia2-desk3.amr.corp.intel.com/

I haven't looked closely at memmap_init_zone, though I've tried
memmap_init_zone_device.  Will take a closer look to see how well this could be
incorporated.

Daniel

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ