lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <x49y57ba2ne.fsf@segfault.boston.devel.redhat.com>
Date:	Thu, 05 Sep 2013 08:12:05 -0400
From:	Jeff Moyer <jmoyer@...hat.com>
To:	rob.gittins@...ux.intel.com
Cc:	linux-kernel@...r.kernel.org, linux-fsdevel@...er.org,
	linux-pmfs@...ts.infradead.org
Subject: Re: RFC Block Layer Extensions to Support NV-DIMMs

Rob Gittins <rob.gittins@...ux.intel.com> writes:

> Direct Memory Mappable DIMMs (DMMD) appear in the system address space
> and are accessed via load and store instructions.  These NVDIMMs
> are part of the system physical address space (SPA) as memory with
> the attribute that data survives a power interruption.  As such this
> memory is managed by the kernel which can  assign virtual addresses and
> mapped into application’s address space as well as being accessible
> by the kernel.  The area mapped into the system address space is
> being referred to as persistent memory (PMEM).
>
> PMEM introduces the need for new operations in the
> block_device_operations to support the specific characteristics of
> the media.
>
> First data may not propagate all the way through the memory pipeline
> when store instructions are executed.  Data may stay in the CPU cache
> or in other buffers in the processor and memory complex.  In order to
> ensure the durability of data there needs to be a driver entry point
> to force a byte range out to media.  The methods of doing this are
> specific to the PMEM technology and need to be handled by the driver
> that is supporting the DMMDs.  To provide a way to ensure that data is
> durable adding a commit function to the block_device_operations vector.

If the memory is available to be mapped into the address space of the
kernel or a user process, then I don't see why we should have a block
device at all.  I think it would make more sense to have a different
driver class for these persistent memory devices.

>    void (*commitpmem)(struct block_device *bdev, void *addr);

Seems like this would benefit from a length argument as well, no?

> Another area requiring extension is the need to be able to clear PMEM
> errors.  When data is fetched from errored PMEM it is marked with the
> poison attribute.  If the CPU attempts to access the data it causes a
> machine check.  How errors are cleared is hardware dependent and needs
> to be handled by the specific device driver.  The following function
> in the block_device_operations vector would clear the correct range
> of PMEM and put new data there.  If the argument data is null or the
> size is zero the driver is free to put any data in PMEM it wishes.
>
> 	void (*clearerrorpmem)(struct block_device *bdev, void *addr,
> 			size_t len, void *data);

What is the size of data?

> Different applications, filesystem and drivers may wish to share
> ranges of PMEM.  This is analogous to partitioning a disk that is
> using multiple and different filesystems.   Since PMEM is addressed
> on a byte basis rather than a block basis the existing partitioning
> model does not fit well.  As a result there needs to be a way to
> describe PMEM ranges.
>
> 	struct pmem_layout *(*getpmem)(struct block_device *bdev);

If existing partitioning doesn't work well, then it sounds like a block
device isn't the right fit (again).  Ignoring that detail, what about
requesting and releasing ranges of persistent memory, as in your
"partitioning" example?  Would that not also be a function of the
driver?

Cheers,
Jeff

>
>
> Proposed patch.
>
> ---
> Documentation/filesystems/Locking |    6 ++++
> fs/block_dev.c                    |   42
> +++++++++++++++++++++++++++++++++
> include/linux/blkdev.h            |    4 +++
> include/linux/pmem.h              |   47
> +++++++++++++++++++++++++++++++++++++
> 4 files changed, 99 insertions(+), 0 deletions(-)
> create mode 100644 include/linux/pmem.h
>
> diff --git a/Documentation/filesystems/Locking
> b/Documentation/filesystems/Locking
> index 0706d32..78910f4 100644
> --- a/Documentation/filesystems/Locking
> +++ b/Documentation/filesystems/Locking
> @@ -386,6 +386,9 @@ prototypes:
> 	int (*revalidate_disk) (struct gendisk *);
> 	int (*getgeo)(struct block_device *, struct hd_geometry *);
> 	void (*swap_slot_free_notify) (struct block_device *, unsigned long);
> +	struct pmem_layout *(*getpmem)(struct block_device *);
> +	void (*commitpmem)(struct block_device *, void *);
> +	void (*clearerrorpmem)(struct block_device *, void *, size_t, void *);
> locking rules:
> 			bd_mutex
> @@ -399,6 +402,9 @@ unlock_native_capacity:	no
> revalidate_disk:	no
> getgeo:			no
> swap_slot_free_notify:	no	(see below)
> +getpmem:		no
> +commitpmem:		no
> +clearerrorpmem:	no
> media_changed, unlock_native_capacity and revalidate_disk are called
> only
> from
> check_disk_change().
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index aae187a..a57863c 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -27,6 +27,7 @@
> #include <linux/namei.h>
> #include <linux/log2.h>
> #include <linux/cleancache.h>
> +#include <linux/pmem.h>
> #include <asm/uaccess.h>
> #include "internal.h"
> @@ -1716,3 +1717,44 @@ void iterate_bdevs(void (*func)(struct
> block_device
> *, void *), void *arg)
> 	spin_unlock(&inode_sb_list_lock);
> 	iput(old_inode);
> }
> +
> +/**
> + * get_pmemgeo() - Return persistent memory geometry information
> + * @bdev:  device to interrogate
> + *
> + * Provides the memory layout for a persistent memory volume which
> + * is made up of CPU-addressable persistent memory.  If the
> interrogated
> + * device does not support CPU-addressable persistent memory then
> -ENOTTY
> + * is returned.
> + *
> + * Return: a pointer to a pmem_layout structure or ERR_PTR
> + */
> +struct pmem_layout *get_pmemgeo(struct block_device *bdev)
> +{
> +	struct gendisk *bd_disk = bdev->bd_disk;
> +
> +	if (!bd_disk || !bd_disk->fops->getpmem)
> +		return ERR_PTR(-ENOTTY);
> +	return bd_disk->fops->getpmem(bdev);
> +}
> +EXPORT_SYMBOL(get_pmemgeo);
> +
> +void commit_pmem(struct block_device *bdev, void *addr)
> +{
> +	struct gendisk *bd_disk = bdev->bd_disk;
> +
> +	if (bd_disk && bd_disk->fops->commitpmem)
> +		bd_disk->fops->commitpmem(bdev, addr);
> +}
> +EXPORT_SYMBOL(commit_pmem);
> +
> +void clear_pmem_error(struct block_device *bdev, void *addr, size_t
> len,
> +				void *data)
> +{
> +	struct gendisk *bd_disk = bdev->bd_disk;
> +
> +	if (bd_disk && bd_disk->fops->clearerrorpmem)
> +		bd_disk->fops->clearerrorpmem(bdev, addr, len, data);
> +}
> +EXPORT_SYMBOL(clear_pmem_error);
> +
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 78feda9..ba2c1f5 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1498,6 +1498,10 @@ struct block_device_operations {
> 	int (*getgeo)(struct block_device *, struct hd_geometry *);
> 	/* this callback is with swap_lock and sometimes page table lock held
> */
> 	void (*swap_slot_free_notify) (struct block_device *, unsigned long);
> +	/* persistent memory operations */
> +	struct pmem_layout * (*getpmem)(struct block_device *);
> +	void (*commitpmem)(struct block_device *, void *);
> +	void (*clearerrorpmem)(struct block_device *, void *, size_t, void *);
> 	struct module *owner;
> };
> diff --git a/include/linux/pmem.h b/include/linux/pmem.h
> new file mode 100644
> index 0000000..f907307
> --- /dev/null
> +++ b/include/linux/pmem.h
> @@ -0,0 +1,47 @@
> +/*
> + * Definitions for the Persistent Memory interface
> + * Copyright (c) 2013, Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify
> it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but
> WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
> or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> License
> for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License
> along with
> + * this program; if not, write to the Free Software Foundation, Inc.,
> + * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
> + */
> +
> +#ifndef _LINUX_PMEM_H
> +#define _LINUX_PMEM_H
> +
> +#include <linux/types.h>
> +
> +struct persistent_memory_extent {
> +	phys_addr_t pme_spa;
> +	u64         pme_len;
> +	int         pme_numa_node;
> +};
> +
> +struct pmem_layout {
> +	u64         pml_flags;
> +	u64         pml_total_size;
> +	u32         pml_extent_count;
> +	u32			pml_interleave;	/* interleave bytes */
> +	struct persistent_memory_extent pml_extents[];
> +};
> +
> +/*
> + * Flags values
> + */
> +#define PMEM_ENABLED 0x0000000000000001 /* can be used for Persistent
> Mem
> */
> +#define PMEM_ERRORED 0x0000000000000002 /* in an error state */
> +#define PMEM_COMMIT  0x0000000000000004 /* commit function available */
> +#define PMEM_CLEAR_ERROR  0x0000000000000008 /* clear error function
> provided */
> +
> +#endif
> +
> --
> 1.7.1
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ