lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <1435355182.18090.64.camel@intel.com>
Date:	Fri, 26 Jun 2015 21:46:26 +0000
From:	"Williams, Dan J" <dan.j.williams@...el.com>
To:	"torvalds@...ux-foundation.org" <torvalds@...ux-foundation.org>
CC:	"toshi.kani@...com" <toshi.kani@...com>,
	"mingo@...nel.org" <mingo@...nel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"nicholas.w.moulin@...ux.intel.com" 
	<nicholas.w.moulin@...ux.intel.com>,
	"Rudoff, Andy" <andy.rudoff@...el.com>,
	"jmoyer@...hat.com" <jmoyer@...hat.com>,
	"tglx@...utronix.de" <tglx@...utronix.de>,
	"hch@....de" <hch@....de>, "axboe@...nel.dk" <axboe@...nel.dk>,
	"Moore, Robert" <robert.moore@...el.com>,
	"Wysocki, Rafael J" <rafael.j.wysocki@...el.com>,
	"hpa@...or.com" <hpa@...or.com>,
	"linux-nvdimm@...ts.01.org" <linux-nvdimm@...ts.01.org>,
	"axboe@...com" <axboe@...com>,
	"willy@...ux.intel.com" <willy@...ux.intel.com>,
	"bp@...en8.de" <bp@...en8.de>,
	"ross.zwisler@...ux.intel.com" <ross.zwisler@...ux.intel.com>,
	"Verma, Vishal L" <vishal.l.verma@...el.com>,
	"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
	"luto@...capital.net" <luto@...capital.net>,
	"linux-acpi@...r.kernel.org" <linux-acpi@...r.kernel.org>,
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>
Subject: [GIT PULL] libnvdimm: non-volatile memory devices for 4.2

Hi Linus, please pull from:

git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm tags/libnvdimm-for-4.2

...to receive the new libnvdimm sub-system, related drivers, and x86
enabling.

---

Earlier this week, in the ACPICA update, you merged the definition of
the new ACPI 6.0 table describing platform non-volatile memory
resources, NFIT (NVDIMM Firmware Interface Table).  The specification
for this table guided development of libnvdimm, a generic kernel
sub-system in support of NVDIMM devices.  The new ACPI_NFIT driver is
the primary consumer of this library, and it also supports the existing
X86_PMEM_LEGACY definition merged in v4.1.

The implementation has been out for review since ACPI 6.0 was released
which coincided with v4.1-rc1.  We have iterated through a steady stream
of tough, but ultimately for the betterment of the code base, review
feedback.  The code is merge ready as we have worked through all the
coarse aspects of the architecture, primarily with Christoph, and have
demonstrated a willingness and ability to quickly spin the
implementation in response to review.

It must be noted that the one aspect of this pull request that Christoph
still has concerns, the external unit test infrastructure in
tools/testing/nvdimm/, is the primary reason we have been able to spin
the implementation with speed and confidence.  The changelog for commit
6bc756193ff6 "tools/testing/nvdimm: libnvdimm unit test infrastructure"
goes into more details of the rationale to include it.  Suffice to say
the potential maintenance burden of carrying driver test infrastructure
in-tree is overshadowed by the benefits of demonstrating the
implementation in the absence of hardware, catching bugs (the majority
of them), and identifying incomplete enabling.

Notably missing from this merge request are some of the wider arch
cleanups (generic ioremap_cache()) and core kernel enabling (__pfn_t,
kmap_atomic_pfn_t(), memremap(), etc...) updates that were identified in
the course of development.  Those will need to wait for 4.3.  In the
meantime this does include the new pmem api which hooks up the "pcommit"
instruction that was previously merged in v4.1 and outlines what an
architecture may need to implement to reliably support pmem.

Please pull, full commit log below so you can see what has been
explicitly acked.

Thank you!

---

The following changes since commit f3b6ced236259a87829b829e8e542ff53bfb9a4f:

  ACPICA: Fix for ill-formed GUID strings for NFIT tables. (2015-05-25 23:42:34 +0200)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm tags/libnvdimm-for-4.2

for you to fetch changes up to 61031952f4c89dba1065f7a5b9419badb112554c:

  arch, x86: pmem api for ensuring durability of persistent memory updates (2015-06-26 11:23:38 -0400)

----------------------------------------------------------------
The libnvdimm sub-system introduces, in addition to the libnvdimm-core,
4 drivers / enabling modules:

NFIT:
Instantiates an "nvdimm bus" with the core and registers memory devices
(NVDIMMs) enumerated by the ACPI 6.0 NFIT (NVDIMM Firmware Interface
table).  After registering NVDIMMs the NFIT driver then registers
"region" devices.  A libnvdimm-region defines an access mode and the
boundaries of persistent memory media.  A region may span multiple
NVDIMMs that are interleaved by the hardware memory controller.  In
turn, a libnvdimm-region can be carved into a "namespace" device and
bound to the PMEM or BLK driver which will attach a Linux block device
(disk) interface to the memory.

PMEM:
Initially merged in v4.1 this driver for contiguous spans of persistent
memory address ranges is re-worked to drive PMEM-namespaces emitted by
the libnvdimm-core.  In this update the PMEM driver, on x86, gains the
ability to assert that writes to persistent memory have been flushed all
the way through the caches and buffers in the platform to persistent
media.  See memcpy_to_pmem() and wmb_pmem().

BLK:
This new driver enables access to persistent memory media through "Block
Data Windows" as defined by the NFIT.  The primary difference of this
driver to PMEM is that only a small window of persistent memory is
mapped into system address space at any given point in time.  Per-NVDIMM
windows are reprogrammed at run time, per-I/O, to access different
portions of the media.  BLK-mode, by definition, does not support DAX.

BTT:
This is a library, optionally consumed by either PMEM or BLK, that
converts a byte-accessible namespace into a disk with atomic sector
update semantics (prevents sector tearing on crash or power loss).  The
sinister aspect of sector tearing is that most applications do not know
they have a atomic sector dependency.  At least today's disks rarely
ever tear sectors and if they do one almost certainly gets a CRC error
on access.  NVDIMMs will always tear and always silently.  Until an
application is audited to be robust in the presence of sector-tearing
the usage of BTT is recommended.

Thanks to: Ross Zwisler, Jeff Moyer, Vishal Verma, Christoph Hellwig,
Ingo Molnar, Neil Brown, Boaz Harrosh, Robert Elliott, Matthew Wilcox,
Andy Rudoff, Linda Knippers, Toshi Kani, Nicholas Moulin, Rafael
Wysocki, and Bob Moore.

----------------------------------------------------------------
Dan Williams (24):
      e820, efi: add ACPI 6.0 persistent memory types
      libnvdimm, nfit: initial libnvdimm infrastructure and NFIT support
      libnvdimm: control character device and nvdimm_bus sysfs attributes
      libnvdimm, nfit: dimm/memory-devices
      libnvdimm: control (ioctl) messages for nvdimm_bus and nvdimm devices
      libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver infrastructure
      libnvdimm, nfit: regions (block-data-window, persistent memory, volatile memory)
      libnvdimm: support for legacy (non-aliasing) nvdimms
      libnvdimm, pmem: move pmem to drivers/nvdimm/
      libnvdimm, pmem: add libnvdimm support to the pmem driver
      libnvdimm, nfit: add interleave-set state-tracking infrastructure
      libnvdimm: namespace indices: read and validate
      libnvdimm: pmem label sets and namespace instantiation.
      libnvdimm: blk labels and namespace instantiation
      libnvdimm: write pmem label set
      libnvdimm: write blk label set
      libnvdimm: infrastructure for btt devices
      tools/testing/nvdimm: libnvdimm unit test infrastructure
      libnvdimm: Non-Volatile Devices
      libnvdimm, pmem: fix up max_hw_sectors
      pmem: make_request cleanups
      libnvdimm: enable iostat
      pmem: flag pmem block devices as non-rotational
      libnvdimm, nfit: handle unarmed dimms, mark namespaces read-only

Ross Zwisler (2):
      libnvdimm, nfit, nd_blk: driver for BLK-mode access persistent memory
      arch, x86: pmem api for ensuring durability of persistent memory updates

Toshi Kani (3):
      acpi: Add acpi_map_pxm_to_online_node()
      libnvdimm: Set numa_node to NVDIMM devices
      libnvdimm: Add sysfs numa_node to NVDIMM devices

Vishal Verma (4):
      nd_btt: atomic sector updates
      fs/block_dev.c: skip rw_page if bdev has integrity
      libnvdimm, btt: add support for blk integrity
      libnvdimm, blk: add support for blk integrity

 Documentation/nvdimm/btt.txt          |  283 +++++
 Documentation/nvdimm/nvdimm.txt       |  808 ++++++++++++++
 MAINTAINERS                           |   39 +-
 arch/arm64/kernel/efi.c               |    1 +
 arch/ia64/kernel/efi.c                |    4 +
 arch/x86/Kconfig                      |    4 +
 arch/x86/boot/compressed/eboot.c      |    4 +
 arch/x86/include/asm/cacheflush.h     |   72 ++
 arch/x86/include/asm/io.h             |    6 +
 arch/x86/include/uapi/asm/e820.h      |    1 +
 arch/x86/kernel/e820.c                |   28 +-
 arch/x86/kernel/pmem.c                |   93 +-
 arch/x86/platform/efi/efi.c           |    3 +
 drivers/Kconfig                       |    2 +
 drivers/Makefile                      |    1 +
 drivers/acpi/Kconfig                  |   26 +
 drivers/acpi/Makefile                 |    1 +
 drivers/acpi/nfit.c                   | 1587 ++++++++++++++++++++++++++++
 drivers/acpi/nfit.h                   |  158 +++
 drivers/acpi/numa.c                   |   50 +-
 drivers/block/Kconfig                 |   11 -
 drivers/block/Makefile                |    1 -
 drivers/nvdimm/Kconfig                |   68 ++
 drivers/nvdimm/Makefile               |   20 +
 drivers/nvdimm/blk.c                  |  384 +++++++
 drivers/nvdimm/btt.c                  | 1479 ++++++++++++++++++++++++++
 drivers/nvdimm/btt.h                  |  185 ++++
 drivers/nvdimm/btt_devs.c             |  425 ++++++++
 drivers/nvdimm/bus.c                  |  730 +++++++++++++
 drivers/nvdimm/core.c                 |  465 ++++++++
 drivers/nvdimm/dimm.c                 |  102 ++
 drivers/nvdimm/dimm_devs.c            |  551 ++++++++++
 drivers/nvdimm/label.c                |  927 ++++++++++++++++
 drivers/nvdimm/label.h                |  141 +++
 drivers/nvdimm/namespace_devs.c       | 1870 +++++++++++++++++++++++++++++++++
 drivers/nvdimm/nd-core.h              |   83 ++
 drivers/nvdimm/nd.h                   |  220 ++++
 drivers/{block => nvdimm}/pmem.c      |  227 ++--
 drivers/nvdimm/region.c               |  114 ++
 drivers/nvdimm/region_devs.c          |  787 ++++++++++++++
 fs/block_dev.c                        |    4 +-
 include/linux/acpi.h                  |    5 +
 include/linux/compiler.h              |    2 +
 include/linux/efi.h                   |    3 +-
 include/linux/libnvdimm.h             |  151 +++
 include/linux/nd.h                    |  151 +++
 include/linux/pmem.h                  |  153 +++
 include/uapi/linux/Kbuild             |    1 +
 include/uapi/linux/ndctl.h            |  197 ++++
 lib/Kconfig                           |    3 +
 tools/testing/nvdimm/Kbuild           |   40 +
 tools/testing/nvdimm/Makefile         |    7 +
 tools/testing/nvdimm/config_check.c   |   15 +
 tools/testing/nvdimm/test/Kbuild      |    8 +
 tools/testing/nvdimm/test/iomap.c     |  151 +++
 tools/testing/nvdimm/test/nfit.c      | 1116 ++++++++++++++++++++
 tools/testing/nvdimm/test/nfit_test.h |   29 +
 57 files changed, 13843 insertions(+), 154 deletions(-)
 create mode 100644 Documentation/nvdimm/btt.txt
 create mode 100644 Documentation/nvdimm/nvdimm.txt
 create mode 100644 drivers/acpi/nfit.c
 create mode 100644 drivers/acpi/nfit.h
 create mode 100644 drivers/nvdimm/Kconfig
 create mode 100644 drivers/nvdimm/Makefile
 create mode 100644 drivers/nvdimm/blk.c
 create mode 100644 drivers/nvdimm/btt.c
 create mode 100644 drivers/nvdimm/btt.h
 create mode 100644 drivers/nvdimm/btt_devs.c
 create mode 100644 drivers/nvdimm/bus.c
 create mode 100644 drivers/nvdimm/core.c
 create mode 100644 drivers/nvdimm/dimm.c
 create mode 100644 drivers/nvdimm/dimm_devs.c
 create mode 100644 drivers/nvdimm/label.c
 create mode 100644 drivers/nvdimm/label.h
 create mode 100644 drivers/nvdimm/namespace_devs.c
 create mode 100644 drivers/nvdimm/nd-core.h
 create mode 100644 drivers/nvdimm/nd.h
 rename drivers/{block => nvdimm}/pmem.c (50%)
 create mode 100644 drivers/nvdimm/region.c
 create mode 100644 drivers/nvdimm/region_devs.c
 create mode 100644 include/linux/libnvdimm.h
 create mode 100644 include/linux/nd.h
 create mode 100644 include/linux/pmem.h
 create mode 100644 include/uapi/linux/ndctl.h
 create mode 100644 tools/testing/nvdimm/Kbuild
 create mode 100644 tools/testing/nvdimm/Makefile
 create mode 100644 tools/testing/nvdimm/config_check.c
 create mode 100644 tools/testing/nvdimm/test/Kbuild
 create mode 100644 tools/testing/nvdimm/test/iomap.c
 create mode 100644 tools/testing/nvdimm/test/nfit.c
 create mode 100644 tools/testing/nvdimm/test/nfit_test.h


commit ad5fb870c486d932a1749d7853dd70f436a7e03f
Author: Dan Williams <dan.j.williams@...el.com>
Date:   Fri Apr 3 12:05:28 2015 -0400

    e820, efi: add ACPI 6.0 persistent memory types
    
    ACPI 6.0 formalizes e820-type-7 and efi-type-14 as persistent memory.
    Mark it "reserved" and allow it to be claimed by a persistent memory
    device driver.
    
    This definition is in addition to the Linux kernel's existing type-12
    definition that was recently added in support of shipping platforms with
    NVDIMM support that predate ACPI 6.0 (which now classifies type-12 as
    OEM reserved).
    
    Note, /proc/iomem can be consulted for differentiating legacy
    "Persistent Memory (legacy)" E820_PRAM vs standard "Persistent Memory"
    E820_PMEM.
    
    Cc: Boaz Harrosh <boaz@...xistor.com>
    Cc: Ingo Molnar <mingo@...nel.org>
    Cc: Christoph Hellwig <hch@....de>
    Cc: Andrew Morton <akpm@...ux-foundation.org>
    Cc: Borislav Petkov <bp@...en8.de>
    Cc: H. Peter Anvin <hpa@...or.com>
    Cc: Jens Axboe <axboe@...com>
    Cc: Linus Torvalds <torvalds@...ux-foundation.org>
    Cc: Matthew Wilcox <willy@...ux.intel.com>
    Cc: Thomas Gleixner <tglx@...utronix.de>
    Acked-by: Jeff Moyer <jmoyer@...hat.com>
    Acked-by: Andy Lutomirski <luto@...capital.net>
    Reviewed-by: Ross Zwisler <ross.zwisler@...ux.intel.com>
    Acked-by: Christoph Hellwig <hch@....de>
    Tested-by: Toshi Kani <toshi.kani@...com>
    Signed-off-by: Dan Williams <dan.j.williams@...el.com>

commit b94d5230d06eb930be82e67fb1a9a58271e78297
Author: Dan Williams <dan.j.williams@...el.com>
Date:   Tue May 19 22:54:31 2015 -0400

    libnvdimm, nfit: initial libnvdimm infrastructure and NFIT support
    
    A struct nvdimm_bus is the anchor device for registering nvdimm
    resources and interfaces, for example, a character control device,
    nvdimm devices, and I/O region devices.  The ACPI NFIT (NVDIMM Firmware
    Interface Table) is one possible platform description for such
    non-volatile memory resources in a system.  The nfit.ko driver attaches
    to the "ACPI0012" device that indicates the presence of the NFIT and
    parses the table to register a struct nvdimm_bus instance.
    
    Cc: <linux-acpi@...r.kernel.org>
    Cc: Lv Zheng <lv.zheng@...el.com>
    Cc: Robert Moore <robert.moore@...el.com>
    Cc: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
    Acked-by: Jeff Moyer <jmoyer@...hat.com>
    Acked-by: Christoph Hellwig <hch@....de>
    Acked-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
    Tested-by: Toshi Kani <toshi.kani@...com>
    Signed-off-by: Dan Williams <dan.j.williams@...el.com>

commit 45def22c1fab85764646746ce38d45b2f3281fa5
Author: Dan Williams <dan.j.williams@...el.com>
Date:   Sun Apr 26 19:26:48 2015 -0400

    libnvdimm: control character device and nvdimm_bus sysfs attributes
    
    The control device for a nvdimm_bus is registered as an "nd" class
    device.  The expectation is that there will usually only be one "nd" bus
    registered under /sys/class/nd.  However, we allow for the possibility
    of multiple buses and they will listed in discovery order as
    ndctl0...ndctlN.  This character device hosts the ioctl for passing
    control messages.  The initial command set has a 1:1 correlation with
    the commands listed in the by the "NFIT DSM Example" document [1], but
    this scheme is extensible to future command sets.
    
    Note, nd_ioctl() and the backing ->ndctl() implementation are defined in
    a subsequent patch.  This is simply the initial registrations and sysfs
    attributes.
    
    [1]: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
    
    Cc: Neil Brown <neilb@...e.de>
    Cc: Greg KH <gregkh@...uxfoundation.org>
    Cc: <linux-acpi@...r.kernel.org>
    Cc: Robert Moore <robert.moore@...el.com>
    Cc: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
    Acked-by: Christoph Hellwig <hch@....de>
    Acked-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
    Tested-by: Toshi Kani <toshi.kani@...com>
    Signed-off-by: Dan Williams <dan.j.williams@...el.com>

commit e6dfb2de47768efe8cc37c9a1863d2aff81440fb
Author: Dan Williams <dan.j.williams@...el.com>
Date:   Sat Apr 25 03:56:17 2015 -0400

    libnvdimm, nfit: dimm/memory-devices
    
    Enable nvdimm devices to be registered on a nvdimm_bus.  The kernel
    assigned device id for nvdimm devicesis dynamic.  If userspace needs a
    more static identifier it should consult a provider-specific attribute.
    In the case where NFIT is the provider, the 'nmemX/nfit/handle' or
    'nmemX/nfit/serial' attributes may be used for this purpose.
    
    Cc: Neil Brown <neilb@...e.de>
    Cc: <linux-acpi@...r.kernel.org>
    Cc: Greg KH <gregkh@...uxfoundation.org>
    Cc: Robert Moore <robert.moore@...el.com>
    Cc: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
    Acked-by: Christoph Hellwig <hch@....de>
    Acked-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
    Tested-by: Toshi Kani <toshi.kani@...com>
    Signed-off-by: Dan Williams <dan.j.williams@...el.com>

commit 62232e45f4a265abb43f0acf16e58f5d0b6e1ec9
Author: Dan Williams <dan.j.williams@...el.com>
Date:   Mon Jun 8 14:27:06 2015 -0400

    libnvdimm: control (ioctl) messages for nvdimm_bus and nvdimm devices
    
    Most discovery/configuration of the nvdimm-subsystem is done via sysfs
    attributes.  However, some nvdimm_bus instances, particularly the
    ACPI.NFIT bus, define a small set of messages that can be passed to the
    platform.  For convenience we derive the initial libnvdimm-ioctl command
    formats directly from the NFIT DSM Interface Example formats.
    
        ND_CMD_SMART: media health and diagnostics
        ND_CMD_GET_CONFIG_SIZE: size of the label space
        ND_CMD_GET_CONFIG_DATA: read label space
        ND_CMD_SET_CONFIG_DATA: write label space
        ND_CMD_VENDOR: vendor-specific command passthrough
        ND_CMD_ARS_CAP: report address-range-scrubbing capabilities
        ND_CMD_ARS_START: initiate scrubbing
        ND_CMD_ARS_STATUS: report on scrubbing state
        ND_CMD_SMART_THRESHOLD: configure alarm thresholds for smart events
    
    If a platform later defines different commands than this set it is
    straightforward to extend support to those formats.
    
    Most of the commands target a specific dimm.  However, the
    address-range-scrubbing commands target the bus.  The 'commands'
    attribute in sysfs of an nvdimm_bus, or nvdimm, enumerate the supported
    commands for that object.
    
    Cc: <linux-acpi@...r.kernel.org>
    Cc: Robert Moore <robert.moore@...el.com>
    Cc: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
    Reported-by: Nicholas Moulin <nicholas.w.moulin@...ux.intel.com>
    Acked-by: Christoph Hellwig <hch@....de>
    Signed-off-by: Dan Williams <dan.j.williams@...el.com>

commit 4d88a97aa9e8cfa6460aab119c5da60ad2267423
Author: Dan Williams <dan.j.williams@...el.com>
Date:   Sun May 31 14:41:48 2015 -0400

    libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver infrastructure
    
    * Implement the device-model infrastructure for loading modules and
      attaching drivers to nvdimm devices.  This is a simple association of a
      nd-device-type number with a driver that has a bitmask of supported
      device types.  To facilitate userspace bind/unbind operations 'modalias'
      and 'devtype', that also appear in the uevent, are added as generic
      sysfs attributes for all nvdimm devices.  The reason for the device-type
      number is to support sub-types within a given parent devtype, be it a
      vendor-specific sub-type or otherwise.
    
    * The first consumer of this infrastructure is the driver
      for dimm devices.  It simply uses control messages to retrieve and
      store the configuration-data image (label set) from each dimm.
    
    Note: nd_device_register() arranges for asynchronous registration of
          nvdimm bus devices by default.
    
    Cc: Greg KH <gregkh@...uxfoundation.org>
    Cc: Neil Brown <neilb@...e.de>
    Acked-by: Christoph Hellwig <hch@....de>
    Tested-by: Toshi Kani <toshi.kani@...com>
    Signed-off-by: Dan Williams <dan.j.williams@...el.com>

commit 1f7df6f88b9245a7f2d0f8ecbc97dc88c8d0d8e1
Author: Dan Williams <dan.j.williams@...el.com>
Date:   Tue Jun 9 20:13:14 2015 -0400

    libnvdimm, nfit: regions (block-data-window, persistent memory, volatile memory)
    
    A "region" device represents the maximum capacity of a BLK range (mmio
    block-data-window(s)), or a PMEM range (DAX-capable persistent memory or
    volatile memory), without regard for aliasing.  Aliasing, in the
    dimm-local address space (DPA), is resolved by metadata on a dimm to
    designate which exclusive interface will access the aliased DPA ranges.
    Support for the per-dimm metadata/label arrvies is in a subsequent
    patch.
    
    The name format of "region" devices is "regionN" where, like dimms, N is
    a global ida index assigned at discovery time.  This id is not reliable
    across reboots nor in the presence of hotplug.  Look to attributes of
    the region or static id-data of the sub-namespace to generate a
    persistent name.  However, if the platform configuration does not change
    it is reasonable to expect the same region id to be assigned at the next
    boot.
    
    "region"s have 2 generic attributes "size", and "mapping"s where:
    - size: the BLK accessible capacity or the span of the
      system physical address range in the case of PMEM.
    
    - mappingN: a tuple describing a dimm's contribution to the region's
      capacity in the format (<nmemX>,<dpa>,<size>).  For a PMEM-region
      there will be at least one mapping per dimm in the interleave set.  For
      a BLK-region there is only "mapping0" listing the starting DPA of the
      BLK-region and the available DPA capacity of that space (matches "size"
      above).
    
    The max number of mappings per "region" is hard coded per the
    constraints of sysfs attribute groups.  That said the number of mappings
    per region should never exceed the maximum number of possible dimms in
    the system.  If the current number turns out to not be enough then the
    "mappings" attribute clarifies how many there are supposed to be. "32
    should be enough for anybody...".
    
    Cc: Neil Brown <neilb@...e.de>
    Cc: <linux-acpi@...r.kernel.org>
    Cc: Greg KH <gregkh@...uxfoundation.org>
    Cc: Robert Moore <robert.moore@...el.com>
    Cc: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
    Acked-by: Christoph Hellwig <hch@....de>
    Acked-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
    Tested-by: Toshi Kani <toshi.kani@...com>
    Signed-off-by: Dan Williams <dan.j.williams@...el.com>

commit 3d88002e4a7bd40f355550284c6cd140e6fe29dc
Author: Dan Williams <dan.j.williams@...el.com>
Date:   Sun May 31 15:02:11 2015 -0400

    libnvdimm: support for legacy (non-aliasing) nvdimms
    
    The libnvdimm region driver is an intermediary driver that translates
    non-volatile "region"s into "namespace" sub-devices that are surfaced by
    persistent memory block-device drivers (PMEM and BLK).
    
    ACPI 6 introduces the concept that a given nvdimm may simultaneously
    offer multiple access modes to its media through direct PMEM load/store
    access, or windowed BLK mode.  Existing nvdimms mostly implement a PMEM
    interface, some offer a BLK-like mode, but never both as ACPI 6 defines.
    If an nvdimm is single interfaced, then there is no need for dimm
    metadata labels.  For these devices we can take the region boundaries
    directly to create a child namespace device (nd_namespace_io).
    
    Acked-by: Christoph Hellwig <hch@....de>
    Tested-by: Toshi Kani <toshi.kani@...com>
    Signed-off-by: Dan Williams <dan.j.williams@...el.com>

commit 18da2c9ee41a036bf470dbad73c18a815725d36e
Author: Dan Williams <dan.j.williams@...el.com>
Date:   Tue Jun 9 14:13:37 2015 -0400

    libnvdimm, pmem: move pmem to drivers/nvdimm/
    
    Prepare the pmem driver to consume PMEM namespaces emitted by regions of
    an nvdimm_bus instance.  No functional change.
    
    Acked-by: Christoph Hellwig <hch@....de>
    Tested-by: Toshi Kani <toshi.kani@...com>
    Signed-off-by: Dan Williams <dan.j.williams@...el.com>

commit 9f53f9fa4ad1d8bddd4d14359cdabc531aedffe8
Author: Dan Williams <dan.j.williams@...el.com>
Date:   Tue Jun 9 15:33:45 2015 -0400

    libnvdimm, pmem: add libnvdimm support to the pmem driver
    
    nd_pmem attaches to persistent memory regions and namespaces emitted by
    the libnvdimm subsystem, and, same as the original pmem driver, presents
    the system-physical-address range as a block device.
    
    The existing e820-type-12 to pmem setup is converted to an nvdimm_bus
    that emits an nd_namespace_io device.
    
    Note that the X in 'pmemX' is now derived from the parent region.  This
    provides some stability to the pmem devices names from boot-to-boot.
    The minor numbers are also more predictable by passing 0 to
    alloc_disk().
    
    Cc: Andy Lutomirski <luto@...capital.net>
    Cc: Boaz Harrosh <boaz@...xistor.com>
    Cc: H. Peter Anvin <hpa@...or.com>
    Cc: Jens Axboe <axboe@...com>
    Cc: Ingo Molnar <mingo@...nel.org>
    Cc: Christoph Hellwig <hch@....de>
    Signed-off-by: Ross Zwisler <ross.zwisler@...ux.intel.com>
    Acked-by: Christoph Hellwig <hch@....de>
    Tested-by: Toshi Kani <toshi.kani@...com>
    Signed-off-by: Dan Williams <dan.j.williams@...el.com>

commit eaf961536e1622ad21247ac8d44acd48ba65566e
Author: Dan Williams <dan.j.williams@...el.com>
Date:   Fri May 1 13:11:27 2015 -0400

    libnvdimm, nfit: add interleave-set state-tracking infrastructure
    
    On platforms that have firmware support for reading/writing per-dimm
    label space, a portion of the dimm may be accessible via an interleave
    set PMEM mapping in addition to the dimm's BLK (block-data-window
    aperture(s)) interface.  A label, stored in a "configuration data
    region" on the dimm, disambiguates which dimm addresses are accessed
    through which exclusive interface.
    
    Add infrastructure that allows the kernel to block modifications to a
    label in the set while any member dimm is active.  Note that this is
    meant only for enforcing "no modifications of active labels" via the
    coarse ioctl command.  Adding/deleting namespaces from an active
    interleave set is always possible via sysfs.
    
    Another aspect of tracking interleave sets is tracking their integrity
    when DIMMs in a set are physically re-ordered.  For this purpose we
    generate an "interleave-set cookie" that can be recorded in a label and
    validated against the current configuration.  It is the bus provider
    implementation's responsibility to calculate the interleave set cookie
    and attach it to a given region.
    
    Cc: Neil Brown <neilb@...e.de>
    Cc: <linux-acpi@...r.kernel.org>
    Cc: Greg KH <gregkh@...uxfoundation.org>
    Cc: Robert Moore <robert.moore@...el.com>
    Cc: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
    Acked-by: Christoph Hellwig <hch@....de>
    Acked-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
    Signed-off-by: Dan Williams <dan.j.williams@...el.com>

commit 4a826c83db4edc040da3a66dbefd53f0cfcf457d
Author: Dan Williams <dan.j.williams@...el.com>
Date:   Tue Jun 9 16:09:36 2015 -0400

    libnvdimm: namespace indices: read and validate
    
    This on media label format [1] consists of two index blocks followed by
    an array of labels.  None of these structures are ever updated in place.
    A sequence number tracks the current active index and the next one to
    write, while labels are written to free slots.
    
        +------------+
        |            |
        |  nsindex0  |
        |            |
        +------------+
        |            |
        |  nsindex1  |
        |            |
        +------------+
        |   label0   |
        +------------+
        |   label1   |
        +------------+
        |            |
         ....nslot...
        |            |
        +------------+
        |   labelN   |
        +------------+
    
    After reading valid labels, store the dpa ranges they claim into
    per-dimm resource trees.
    
    [1]: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
    
    Cc: Neil Brown <neilb@...e.de>
    Acked-by: Christoph Hellwig <hch@....de>
    Signed-off-by: Dan Williams <dan.j.williams@...el.com>

commit bf9bccc14c05dae8caba29df6187c731710f5380
Author: Dan Williams <dan.j.williams@...el.com>
Date:   Wed Jun 17 17:14:46 2015 -0400

    libnvdimm: pmem label sets and namespace instantiation.
    
    A complete label set is a PMEM-label per-dimm per-interleave-set where
    all the UUIDs match and the interleave set cookie matches the hosting
    interleave set.
    
    Present sysfs attributes for manipulation of a PMEM-namespace's
    'alt_name', 'uuid', and 'size' attributes.  A later patch will make
    these settings persistent by writing back the label.
    
    Note that PMEM allocations grow forwards from the start of an interleave
    set (lowest dimm-physical-address (DPA)).  BLK-namespaces that alias
    with a PMEM interleave set will grow allocations backward from the
    highest DPA.
    
    Cc: Greg KH <gregkh@...uxfoundation.org>
    Cc: Neil Brown <neilb@...e.de>
    Acked-by: Christoph Hellwig <hch@....de>
    Signed-off-by: Dan Williams <dan.j.williams@...el.com>

commit 1b40e09a1232de537b193fa1b6b3ef16d3a1e397
Author: Dan Williams <dan.j.williams@...el.com>
Date:   Fri May 1 13:34:01 2015 -0400

    libnvdimm: blk labels and namespace instantiation
    
    A blk label set describes a namespace comprised of one or more
    discontiguous dpa ranges on a single dimm.  They may alias with one or
    more pmem interleave sets that include the given dimm.
    
    This is the runtime/volatile configuration infrastructure for sysfs
    manipulation of 'alt_name', 'uuid', 'size', and 'sector_size'.  A later
    patch will make these settings persistent by writing back the label(s).
    
    Unlike pmem namespaces, multiple blk namespaces can be created per
    region.  Once a blk namespace has been created a new seed device
    (unconfigured child of a parent blk region) is instantiated.  As long as
    a region has 'available_size' != 0 new child namespaces may be created.
    
    Cc: Greg KH <gregkh@...uxfoundation.org>
    Cc: Neil Brown <neilb@...e.de>
    Acked-by: Christoph Hellwig <hch@....de>
    Signed-off-by: Dan Williams <dan.j.williams@...el.com>

commit f524bf271a5cf12a44253194abcf8b6688ff5b9d
Author: Dan Williams <dan.j.williams@...el.com>
Date:   Sat May 30 12:36:02 2015 -0400

    libnvdimm: write pmem label set
    
    After 'uuid', 'size', and optionally 'alt_name' have been set to valid
    values the labels on the dimms can be updated.
    
    Write procedure is:
    1/ Allocate and write new labels in the "next" index
    2/ Free the old labels in the working copy
    3/ Write the bitmap and the label space on the dimm
    4/ Write the index to make the update valid
    
    Label ranges directly mirror the dpa resource values for the given
    label_id of the namespace.
    
    Cc: Greg KH <gregkh@...uxfoundation.org>
    Cc: Neil Brown <neilb@...e.de>
    Acked-by: Christoph Hellwig <hch@....de>
    Signed-off-by: Dan Williams <dan.j.williams@...el.com>

commit 0ba1c634892b3590779803a701bcb82e8c32cc7a
Author: Dan Williams <dan.j.williams@...el.com>
Date:   Sat May 30 12:35:36 2015 -0400

    libnvdimm: write blk label set
    
    After 'uuid', 'size', 'sector_size', and optionally 'alt_name' have been
    set to valid values the labels on the dimm can be updated.  The
    difference with the pmem case is that blk namespaces are limited to one
    dimm and can cover discontiguous ranges in dpa space.
    
    Also, after allocating label slots, it is useful for userspace to know
    how many slots are left.  Export this information in sysfs.
    
    Cc: Greg KH <gregkh@...uxfoundation.org>
    Cc: Neil Brown <neilb@...e.de>
    Acked-by: Christoph Hellwig <hch@....de>
    Signed-off-by: Dan Williams <dan.j.williams@...el.com>

commit 8c2f7e8658df1d3b7cbfa62706941d14c715823a
Author: Dan Williams <dan.j.williams@...el.com>
Date:   Thu Jun 25 04:20:04 2015 -0400

    libnvdimm: infrastructure for btt devices
    
    NVDIMM namespaces, in addition to accepting "struct bio" based requests,
    also have the capability to perform byte-aligned accesses.  By default
    only the bio/block interface is used.  However, if another driver can
    make effective use of the byte-aligned capability it can claim namespace
    interface and use the byte-aligned ->rw_bytes() interface.
    
    The BTT driver is the initial first consumer of this mechanism to allow
    adding atomic sector update semantics to a pmem or blk namespace.  This
    patch is the sysfs infrastructure to allow configuring a BTT instance
    for a namespace.  Enabling that BTT and performing i/o is in a
    subsequent patch.
    
    Cc: Greg KH <gregkh@...uxfoundation.org>
    Cc: Neil Brown <neilb@...e.de>
    Signed-off-by: Dan Williams <dan.j.williams@...el.com>

commit 5212e11fde4d40fa627668b4f2222d20db488f71
Author: Vishal Verma <vishal.l.verma@...el.com>
Date:   Thu Jun 25 04:20:32 2015 -0400

    nd_btt: atomic sector updates
    
    BTT stands for Block Translation Table, and is a way to provide power
    fail sector atomicity semantics for block devices that have the ability
    to perform byte granularity IO. It relies on the capability of libnvdimm
    namespace devices to do byte aligned IO.
    
    The BTT works as a stacked blocked device, and reserves a chunk of space
    from the backing device for its accounting metadata. It is a bio-based
    driver because all IO is done synchronously, and there is no queuing or
    asynchronous completions at either the device or the driver level.
    
    The BTT uses 'lanes' to index into various 'on-disk' data structures,
    and lanes also act as a synchronization mechanism in case there are more
    CPUs than available lanes. We did a comparison between two lane lock
    strategies - first where we kept an atomic counter around that tracked
    which was the last lane that was used, and 'our' lane was determined by
    atomically incrementing that. That way, for the nr_cpus > nr_lanes case,
    theoretically, no CPU would be blocked waiting for a lane. The other
    strategy was to use the cpu number we're scheduled on to and hash it to
    a lane number. Theoretically, this could block an IO that could've
    otherwise run using a different, free lane. But some fio workloads
    showed that the direct cpu -> lane hash performed faster than tracking
    'last lane' - my reasoning is the cache thrash caused by moving the
    atomic variable made that approach slower than simply waiting out the
    in-progress IO. This supports the conclusion that the driver can be a
    very simple bio-based one that does synchronous IOs instead of queuing.
    
    Cc: Andy Lutomirski <luto@...capital.net>
    Cc: Boaz Harrosh <boaz@...xistor.com>
    Cc: H. Peter Anvin <hpa@...or.com>
    Cc: Jens Axboe <axboe@...com>
    Cc: Ingo Molnar <mingo@...nel.org>
    Cc: Christoph Hellwig <hch@....de>
    Cc: Neil Brown <neilb@...e.de>
    Cc: Jeff Moyer <jmoyer@...hat.com>
    Cc: Dave Chinner <david@...morbit.com>
    Cc: Greg KH <gregkh@...uxfoundation.org>
    [jmoyer: fix nmi watchdog timeout in btt_map_init]
    [jmoyer: move btt initialization to module load path]
    [jmoyer: fix memory leak in the btt initialization path]
    [jmoyer: Don't overwrite corrupted arenas]
    Signed-off-by: Vishal Verma <vishal.l.verma@...ux.intel.com>
    Signed-off-by: Dan Williams <dan.j.williams@...el.com>

commit 047fc8a1f9a6330eacc80374dff087e20dc2304b
Author: Ross Zwisler <ross.zwisler@...ux.intel.com>
Date:   Thu Jun 25 04:21:02 2015 -0400

    libnvdimm, nfit, nd_blk: driver for BLK-mode access persistent memory
    
    The libnvdimm implementation handles allocating dimm address space (DPA)
    between PMEM and BLK mode interfaces.  After DPA has been allocated from
    a BLK-region to a BLK-namespace the nd_blk driver attaches to handle I/O
    as a struct bio based block device. Unlike PMEM, BLK is required to
    handle platform specific details like mmio register formats and memory
    controller interleave.  For this reason the libnvdimm generic nd_blk
    driver calls back into the bus provider to carry out the I/O.
    
    This initial implementation handles the BLK interface defined by the
    ACPI 6 NFIT [1] and the NVDIMM DSM Interface Example [2] composed from
    DCR (dimm control region), BDW (block data window), IDT (interleave
    descriptor) NFIT structures and the hardware register format.
    [1]: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
    [2]: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
    
    Cc: Andy Lutomirski <luto@...capital.net>
    Cc: Boaz Harrosh <boaz@...xistor.com>
    Cc: H. Peter Anvin <hpa@...or.com>
    Cc: Jens Axboe <axboe@...com>
    Cc: Ingo Molnar <mingo@...nel.org>
    Cc: Christoph Hellwig <hch@....de>
    Signed-off-by: Ross Zwisler <ross.zwisler@...ux.intel.com>
    Acked-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
    Signed-off-by: Dan Williams <dan.j.williams@...el.com>

commit 6bc756193ff61bf5e7b3cfedfbb0873bf40f8055
Author: Dan Williams <dan.j.williams@...el.com>
Date:   Wed Jun 17 17:23:32 2015 -0400

    tools/testing/nvdimm: libnvdimm unit test infrastructure
    
    'libnvdimm' is the first driver sub-system in the kernel to implement
    mocking for unit test coverage.  The nfit_test module gets built as an
    external module and arranges for external module replacements of nfit,
    libnvdimm, nd_pmem, and nd_blk.  These replacements use the linker
    --wrap option to redirect calls to ioremap() + request_mem_region() to
    custom defined unit test resources.  The end result is a fully
    functional nvdimm_bus, as far as userspace is concerned, but with the
    capability to perform otherwise destructive tests on emulated resources.
    
    Q: Why not use QEMU for this emulation?
    QEMU is not suitable for unit testing.  QEMU's role is to faithfully
    emulate the platform.  A unit test's role is to unfaithfully implement
    the platform with the goal of triggering bugs in the corners of the
    sub-system implementation.  As bugs are discovered in platforms, or the
    sub-system itself, the unit tests are extended to backstop a fix with a
    reproducer unit test.
    
    Another problem with QEMU is that it would require coordination of 3
    software projects instead of 2 (kernel + libndctl [1]) to maintain and
    execute the tests.  The chances for bit rot and the difficulty of
    getting the tests running goes up non-linearly the more components
    involved.
    
    
    Q: Why submit this to the kernel tree instead of external modules in
       libndctl?
    Simple, to alleviate the same risk that out-of-tree external modules
    face.  Updates to drivers/nvdimm/ can be immediately evaluated to see if
    they have any impact on tools/testing/nvdimm/.
    
    
    Q: What are the negative implications of merging this?
    It is a unique maintenance burden because the purpose of mocking an
    interface to enable a unit test is to purposefully short circuit the
    semantics of a routine to enable testing.  For example
    __wrap_ioremap_cache() fakes the pmem driver into "ioremap()'ing" a test
    resource buffer allocated by dma_alloc_coherent().  The future
    maintenance burden hits when someone changes the semantics of
    ioremap_cache() and wonders what the implications are for the unit test.
    
    [1]: https://github.com/pmem/ndctl
    
    Cc: <linux-acpi@...r.kernel.org>
    Cc: Lv Zheng <lv.zheng@...el.com>
    Cc: Robert Moore <robert.moore@...el.com>
    Cc: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
    Cc: Christoph Hellwig <hch@....de>
    Signed-off-by: Dan Williams <dan.j.williams@...el.com>

commit bc30196f715ed3a94d050ef8bc465e567a6050be
Author: Dan Williams <dan.j.williams@...el.com>
Date:   Thu Jun 25 04:48:19 2015 -0400

    libnvdimm: Non-Volatile Devices
    
    Maintainer information and documentation for drivers/nvdimm
    
    Cc: Andy Lutomirski <luto@...capital.net>
    Cc: Boaz Harrosh <boaz@...xistor.com>
    Cc: H. Peter Anvin <hpa@...or.com>
    Cc: Jens Axboe <axboe@...com>
    Cc: Ingo Molnar <mingo@...nel.org>
    Cc: Christoph Hellwig <hch@....de>
    Cc: Neil Brown <neilb@...e.de>
    Cc: Greg KH <gregkh@...uxfoundation.org>
    Signed-off-by: Dan Williams <dan.j.williams@...el.com>

commit f68eb1e71a92765ffd8eb68466a41b48f2fbba04
Author: Vishal Verma <vishal.l.verma@...el.com>
Date:   Tue May 12 13:48:53 2015 -0400

    fs/block_dev.c: skip rw_page if bdev has integrity
    
    If a block device has bio integrity enabled, rw_page will bypass the
    integrity payload, which is undesirable. Skip rw_page if this is the
    case.
    
    Currently brd and zram provide rw_page, and the proposed 'nd' drivers
    will too.
    
    Cc: Jens Axboe <axboe@...com>
    Cc: Martin K. Petersen <martin.petersen@...cle.com>
    Suggested-by: Matthew Wilcox <matthew.r.wilcox@...el.com>
    Signed-off-by: Vishal Verma <vishal.l.verma@...ux.intel.com>
    Signed-off-by: Dan Williams <dan.j.williams@...el.com>

commit 41cd8b70c37ace40077c8d6ec0b74b983178c192
Author: Vishal Verma <vishal.l.verma@...el.com>
Date:   Thu Jun 25 04:21:52 2015 -0400

    libnvdimm, btt: add support for blk integrity
    
    Support multiple block sizes (sector + metadata) using the blk integrity
    framework. This registers a new integrity template that defines the
    protection information tuple size based on the configured metadata size,
    and simply acts as a passthrough for protection information generated by
    another layer. The metadata is written to the storage as-is, and read back
    with each sector.
    
    Signed-off-by: Vishal Verma <vishal.l.verma@...ux.intel.com>
    Signed-off-by: Dan Williams <dan.j.williams@...el.com>

commit fcae695737fca0849c18db814d9d8de05c0fd2a2
Author: Vishal Verma <vishal.l.verma@...el.com>
Date:   Thu Jun 25 04:22:39 2015 -0400

    libnvdimm, blk: add support for blk integrity
    
    Support multiple block sizes (sector + metadata) for nd_blk in the
    same way as done for the BTT. Add the idea of an 'internal' lbasize,
    which is properly aligned and padded, and store metadata in this space.
    
    Signed-off-by: Vishal Verma <vishal.l.verma@...ux.intel.com>
    Signed-off-by: Dan Williams <dan.j.williams@...el.com>

commit 43d3fa3a0491168ad769d20d5cbae45492509d43
Author: Dan Williams <dan.j.williams@...el.com>
Date:   Sat May 16 12:28:50 2015 -0400

    libnvdimm, pmem: fix up max_hw_sectors
    
    There is no hardware limit to enforce on the size of the i/o that can be passed
    to an nvdimm block device, so set it to UINT_MAX.
    
    Reviewed-by: Vishal Verma <vishal.l.verma@...ux.intel.com>
    Signed-off-by: Dan Williams <dan.j.williams@...el.com>

commit edc870e54696beb9f3835ecb41a4e1c84ee4584d
Author: Dan Williams <dan.j.williams@...el.com>
Date:   Sat May 16 12:28:51 2015 -0400

    pmem: make_request cleanups
    
    Various cleanups:
    
    1/ Kill the BUG_ON since we've already told the block layer we don't
       support DISCARD on all these drivers.
    
    2/ Kill the 'rw' variable, no need to cache it.
    
    3/ Kill the local 'sector' variable.  bio_for_each_segment() is already
       advancing the iterator's sector number by the bio_vec length.
    
    4/ Kill the check for accessing past the end of device
       generic_make_request_checks() already does that.
    
    Suggested-by: Christoph Hellwig <hch@....de>
    [hch: kill access past end of the device check]
    Reviewed-by: Vishal Verma <vishal.l.verma@...ux.intel.com>
    Signed-off-by: Dan Williams <dan.j.williams@...el.com>

commit f0dc089ce217e7b98e0d2077c548ff08129e7911
Author: Dan Williams <dan.j.williams@...el.com>
Date:   Sat May 16 12:28:53 2015 -0400

    libnvdimm: enable iostat
    
    This is disabled by default as the overhead is prohibitive, but if the
    user takes the action to turn it on we'll oblige.
    
    Reviewed-by: Vishal Verma <vishal.l.verma@...ux.intel.com>
    Signed-off-by: Dan Williams <dan.j.williams@...el.com>

commit 0f51c4fa7f60838a87cd45e8ba144dddcd4c066c
Author: Dan Williams <dan.j.williams@...el.com>
Date:   Sat May 16 12:28:54 2015 -0400

    pmem: flag pmem block devices as non-rotational
    
    ...since they are effectively SSDs as far as userspace is concerned.
    
    Reviewed-by: Vishal Verma <vishal.l.verma@...ux.intel.com>
    Signed-off-by: Dan Williams <dan.j.williams@...el.com>

commit 581388209405902b56d055f644b4dd124a206112
Author: Dan Williams <dan.j.williams@...el.com>
Date:   Tue Jun 23 20:08:34 2015 -0400

    libnvdimm, nfit: handle unarmed dimms, mark namespaces read-only
    
    Upon detection of an unarmed dimm in a region, arrange for descendant
    BTT, PMEM, or BLK instances to be read-only.  A dimm is primarily marked
    "unarmed" via flags passed by platform firmware (NFIT).
    
    The flags in the NFIT memory device sub-structure indicate the state of
    the data on the nvdimm relative to its energy source or last "flush to
    persistence".  For the most part there is nothing the driver can do but
    advertise the state of these flags in sysfs and emit a message if
    firmware indicates that the contents of the device may be corrupted.
    However, for the case of ACPI_NFIT_MEM_ARMED, the driver can arrange for
    the block devices incorporating that nvdimm to be marked read-only.
    This is a safe default as the data is still available and new writes are
    held off until the administrator either forces read-write mode, or the
    energy source becomes armed.
    
    A 'read_only' attribute is added to REGION devices to allow for
    overriding the default read-only policy of all descendant block devices.
    
    Signed-off-by: Dan Williams <dan.j.williams@...el.com>

commit 99759869faf15471cfce251bc138848d8af7d162
Author: Toshi Kani <toshi.kani@...com>
Date:   Fri Jun 19 17:14:15 2015 -0600

    acpi: Add acpi_map_pxm_to_online_node()
    
    The kernel initializes CPU & memory's NUMA topology from ACPI
    SRAT table.  Some other ACPI tables, such as NFIT and DMAR, also
    contain proximity IDs for their device's NUMA topology.  This
    information can be used to improve performance of these devices.
    
    This patch introduces acpi_map_pxm_to_online_node(), which is
    similar to acpi_map_pxm_to_node(), but always returns an online
    node.  When the mapped node from a given proximity ID is offline,
    it looks up the node distance table and returns the nearest
    online node.
    
    ACPI device drivers, which are called after the NUMA initialization
    has completed in the kernel, can call this interface to obtain their
    device NUMA topology from ACPI tables.  Such drivers do not have to
    deal with offline nodes.  A node may be offline when a device
    proximity ID is unique, SRAT memory entry does not exist, or NUMA is
    disabled, ex. "numa=off" on x86.
    
    This patch also moves the pxm range check from acpi_get_node() to
    acpi_map_pxm_to_node().
    
    Signed-off-by: Toshi Kani <toshi.kani@...com>
    Acked-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>>
    Signed-off-by: Dan Williams <dan.j.williams@...el.com>

commit 41d7a6d637e1440f5410cb43c25a3c41255540c5
Author: Toshi Kani <toshi.kani@...com>
Date:   Fri Jun 19 12:18:33 2015 -0600

    libnvdimm: Set numa_node to NVDIMM devices
    
    ACPI NFIT table has System Physical Address Range Structure entries that
    describe a proximity ID of each range when ACPI_NFIT_PROXIMITY_VALID is
    set in the flags.
    
    Change acpi_nfit_register_region() to map a proximity ID to its node ID,
    and set it to a new numa_node field of nd_region_desc, which is then
    conveyed to the nd_region device.
    
    The device core arranges for btt and namespace devices to inherit their
    node from their parent region.
    
    Signed-off-by: Toshi Kani <toshi.kani@...com>
    [djbw: move set_dev_node() from region.c to bus.c]
    Signed-off-by: Dan Williams <dan.j.williams@...el.com>

commit 74ae66c3b14ffa94c8d2dea201cdf8e6203d13d5
Author: Toshi Kani <toshi.kani@...com>
Date:   Fri Jun 19 12:18:34 2015 -0600

    libnvdimm: Add sysfs numa_node to NVDIMM devices
    
    Add support of sysfs 'numa_node' to I/O-related NVDIMM devices
    under /sys/bus/nd/devices, regionN, namespaceN.0, and bttN.x.
    
    An example of numa_node values on a 2-socket system with a single
    NVDIMM range on each socket is shown below.
      /sys/bus/nd/devices
      |-- btt0.0/numa_node:0
      |-- btt1.0/numa_node:1
      |-- btt1.1/numa_node:1
      |-- namespace0.0/numa_node:0
      |-- namespace1.0/numa_node:1
      |-- region0/numa_node:0
      |-- region1/numa_node:1
    
    These numa_node files are then linked under the block class of
    their device names.
      /sys/class/block/pmem0/device/numa_node:0
      /sys/class/block/pmem1s/device/numa_node:1
    
    This enables numactl(8) to accept 'block:' and 'file:' paths of
    pmem and btt devices as shown in the examples below.
      numactl --preferred block:pmem0 --show
      numactl --preferred file:/dev/pmem1s --show
    
    Signed-off-by: Toshi Kani <toshi.kani@...com>
    Signed-off-by: Dan Williams <dan.j.williams@...el.com>

commit 61031952f4c89dba1065f7a5b9419badb112554c
Author: Ross Zwisler <ross.zwisler@...ux.intel.com>
Date:   Thu Jun 25 03:08:39 2015 -0400

    arch, x86: pmem api for ensuring durability of persistent memory updates
    
    Based on an original patch by Ross Zwisler [1].
    
    Writes to persistent memory have the potential to be posted to cpu
    cache, cpu write buffers, and platform write buffers (memory controller)
    before being committed to persistent media.  Provide apis,
    memcpy_to_pmem(), wmb_pmem(), and memremap_pmem(), to write data to
    pmem and assert that it is durable in PMEM (a persistent linear address
    range).  A '__pmem' attribute is added so sparse can track proper usage
    of pointers to pmem.
    
    This continues the status quo of pmem being x86 only for 4.2, but
    reworks to ioremap, and wider implementation of memremap() will enable
    other archs in 4.3.
    
    [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-May/000932.html
    
    Cc: Thomas Gleixner <tglx@...utronix.de>
    Cc: Ingo Molnar <mingo@...hat.com>
    Cc: "H. Peter Anvin" <hpa@...or.com>
    Signed-off-by: Ross Zwisler <ross.zwisler@...ux.intel.com>
    [djbw: various reworks]
    Signed-off-by: Dan Williams <dan.j.williams@...el.com>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ