lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <153176041838.12695.3365448145295112857.stgit@dwillia2-desk3.amr.corp.intel.com>
Date:   Mon, 16 Jul 2018 10:00:19 -0700
From:   Dan Williams <dan.j.williams@...el.com>
To:     akpm@...ux-foundation.org
Cc:     Tony Luck <tony.luck@...el.com>, Huaisheng Ye <yehs1@...ovo.com>,
        Vishal Verma <vishal.l.verma@...el.com>,
        Jan Kara <jack@...e.cz>, Matthew Wilcox <willy@...radead.org>,
        Dave Jiang <dave.jiang@...el.com>,
        "H. Peter Anvin" <hpa@...or.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Pavel Tatashin <pasha.tatashin@...cle.com>,
        Rich Felker <dalias@...c.org>,
        Fenghua Yu <fenghua.yu@...el.com>,
        Daniel Jordan <daniel.m.jordan@...cle.com>,
        Yoshinori Sato <ysato@...rs.sourceforge.jp>,
        Benjamin Herrenschmidt <benh@...nel.crashing.org>,
        Michal Hocko <mhocko@...e.com>,
        Paul Mackerras <paulus@...ba.org>,
        Christoph Hellwig <hch@....de>,
        Jérôme Glisse <jglisse@...hat.com>,
        Ingo Molnar <mingo@...hat.com>,
        Michael Ellerman <mpe@...erman.id.au>,
        Heiko Carstens <heiko.carstens@...ibm.com>, x86@...nel.org,
        Logan Gunthorpe <logang@...tatee.com>,
        Ross Zwisler <ross.zwisler@...ux.intel.com>,
        Jeff Moyer <jmoyer@...hat.com>,
        Johannes Thumshirn <jthumshirn@...e.de>,
        Martin Schwidefsky <schwidefsky@...ibm.com>,
        linux-mm@...ck.org, jack@...e.cz, linux-nvdimm@...ts.01.org,
        linux-kernel@...r.kernel.org
Subject: [PATCH v2 00/14] mm: Asynchronous + multithreaded memmap init for
 ZONE_DEVICE

Changes since v1 [1]:
* Teach memmap_sync() to take over a sub-set of memmap initialization in
  the foreground. This foreground work still needs to await the
  completion of vmemmap_populate_hugepages(), but it will otherwise
  steal 1/1024th of the 'struct page' init work for the given range.
  (Jan)
* Add kernel-doc for all the new 'async' structures.
* Split foreach_order_pgoff() to its own patch.
* Add Pavel and Daniel to the cc as they have been active in the memory
  hotplug code.
* Fix a typo that prevented CONFIG_DAX_DRIVER_DEBUG=y from performing
  early pfn retrieval at dax-filesystem mount time.
* Improve some of the changelogs

[1]: https://lwn.net/Articles/759117/

---

In order to keep pfn_to_page() a simple offset calculation the 'struct
page' memmap needs to be mapped and initialized in advance of any usage
of a page. This poses a problem for large memory systems as it delays
full availability of memory resources for 10s to 100s of seconds.

For typical 'System RAM' the problem is mitigated by the fact that large
memory allocations tend to happen after the kernel has fully initialized
and userspace services / applications are launched. A small amount, 2GB
of memory, is initialized up front. The remainder is initialized in the
background and freed to the page allocator over time.

Unfortunately, that scheme is not directly reusable for persistent
memory and dax because userspace has visibility to the entire resource
pool and can choose to access any offset directly at its choosing. In
other words there is no allocator indirection where the kernel can
satisfy requests with arbitrary pages as they become initialized.

That said, we can approximate the optimization by performing the
initialization in the background, allow the kernel to fully boot the
platform, start up pmem block devices, mount filesystems in dax mode,
and only incur delay at the first userspace dax fault. When that initial
fault occurs that process is delegated a portion of the memmap to
initialize in the foreground so that it need not wait for initialization
of resources that it does not immediately need.

With this change an 8 socket system was observed to initialize pmem
namespaces in ~4 seconds whereas it was previously taking ~4 minutes.

These patches apply on top of the HMM + devm_memremap_pages() reworks:

https://marc.info/?l=linux-mm&m=153128668008585&w=2

---

Dan Williams (10):
      mm: Plumb dev_pagemap instead of vmem_altmap to memmap_init_zone()
      mm: Enable asynchronous __add_pages() and vmemmap_populate_hugepages()
      mm: Teach memmap_init_zone() to initialize ZONE_DEVICE pages
      mm: Multithread ZONE_DEVICE initialization
      mm, memremap: Up-level foreach_order_pgoff()
      mm: Allow an external agent to coordinate memmap initialization
      filesystem-dax: Make mount time pfn validation a debug check
      libnvdimm, pmem: Initialize the memmap in the background
      device-dax: Initialize the memmap in the background
      libnvdimm, namespace: Publish page structure init state / control

Huaisheng Ye (4):
      libnvdimm, pmem: Allow a NULL-pfn to ->direct_access()
      tools/testing/nvdimm: Allow a NULL-pfn to ->direct_access()
      s390, dcssblk: Allow a NULL-pfn to ->direct_access()
      filesystem-dax: Do not request a pfn when not required


 arch/ia64/mm/init.c             |    5 +
 arch/powerpc/mm/mem.c           |    5 +
 arch/s390/mm/init.c             |    8 +
 arch/sh/mm/init.c               |    5 +
 arch/x86/mm/init_32.c           |    8 +
 arch/x86/mm/init_64.c           |   27 ++--
 drivers/dax/Kconfig             |   10 +
 drivers/dax/dax-private.h       |    2 
 drivers/dax/device-dax.h        |    2 
 drivers/dax/device.c            |   16 ++
 drivers/dax/pmem.c              |    5 +
 drivers/dax/super.c             |   64 ++++++---
 drivers/nvdimm/nd.h             |    2 
 drivers/nvdimm/pfn_devs.c       |   50 +++++--
 drivers/nvdimm/pmem.c           |   17 ++
 drivers/nvdimm/pmem.h           |    1 
 drivers/s390/block/dcssblk.c    |    5 -
 fs/dax.c                        |   10 -
 include/linux/memmap_async.h    |  110 ++++++++++++++++
 include/linux/memory_hotplug.h  |   18 ++-
 include/linux/memremap.h        |   31 ++++
 include/linux/mm.h              |    8 +
 kernel/memremap.c               |   85 ++++++------
 mm/memory_hotplug.c             |   73 ++++++++---
 mm/page_alloc.c                 |  271 +++++++++++++++++++++++++++++++++++----
 mm/sparse-vmemmap.c             |   56 ++++++--
 tools/testing/nvdimm/pmem-dax.c |   11 +-
 27 files changed, 717 insertions(+), 188 deletions(-)
 create mode 100644 include/linux/memmap_async.h

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ