lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <2023082506-enchanted-tripping-d1d5@gregkh>
Date:   Fri, 25 Aug 2023 10:05:08 +0200
From:   Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To:     Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com>
Cc:     Stanislav Kinsburskii <stanislav.kinsburskii@...il.com>,
        Derek Kiernan <derek.kiernan@....com>,
        Dragan Cvetic <dragan.cvetic@....com>,
        Arnd Bergmann <arnd@...db.de>, Wei Liu <wei.liu@...nel.org>,
        "K. Y. Srinivasan" <kys@...rosoft.com>,
        Madhavan Venkataraman <madvenka@...ux.microsoft.com>,
        Anthony Yznaga <anthony.yznaga@...cle.com>,
        "Mike Rapoport (IBM)" <rppt@...nel.org>,
        James Gowans <jgowans@...zon.com>,
        Anirudh Rayabharam <anrayabh@...ux.microsoft.com>,
        Jinank Jain <jinankjain@...ux.microsoft.com>,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] Introduce persistent memory pool

On Tue, Aug 22, 2023 at 11:34:34AM -0700, Stanislav Kinsburskii wrote:
> This patch addresses the need for a memory allocator dedicated to
> persistent memory within the kernel. This allocator will preserve
> kernel-specific states like DMA passthrough device states, IOMMU state, and
> more across kexec.

And CMA doesn't do this for you today?

> The proposed solution offers a foundational implementation for potential
> custom solutions that might follow. Though the implementation is
> intentionally kept concise and straightforward to foster discussion and
> feedback, it's fully functional in its current state.
> 
> The persistent memory pool consists of a simple page allocator that
> operates on reserved physical memory regions. It employs bitmaps to
> allocate and free pages, with the following characteristics:
> 
>   1. Memory pool regions are specified using the command line option:
> 
>        pmpool=<base>,<size>
> 
>      Where <base> represents the physical memory base address and <size>
>      indicates the memory size.
> 
>   2. While the pages allocation emulates the buddy allocator interface, it
>      isn't confined to the MAX_ORDER.
> 
>   3. The memory pool initializes during a cold boot and is retained across
>      kexec.
> 
> Potential applications include:
> 
>   1. Allowing various in-kernel entities to allocate persistent pages from
>      a singular memory pool, eliminating the need for multiple region
>      reservations.
> 
>   2. For in-kernel components that require the allocation address to be
>      available on kernel kexec, this address can be exposed to user space and
>      then passed via the command line.
> 
>   3. Separate subsystems or drivers can reserve their region, sharing a
>      portion of it for their persistent memory pool. This might be a file
>      system, a key-value store, or other applications.
> 
> Potential Enhancements for the Proposed Memory Pool:
> 
>   1. Multiple Memory Regions Support: enhance the pool to accommodate and
>      manage multiple memory regions simultaneously.
> 
>   2. In-Kernel Memory Allocations for Storage Memory Regions:
>    * Allow in-kernel memory allocations to serve as storage memory regions.
>    * Implement explicit reservation of these designated regions during kexec.

As you have no in-kernel users of this, it's not something we can even
consider at the moment for obvious reasons (neither would you want us
to.)

Can you make this part of a patch series that actually adds a user,
probably more than one, so that we can see if any of this even makes
sense?


>  drivers/misc/Kconfig   |    7 +
>  drivers/misc/Makefile  |    1 
>  drivers/misc/pmpool.c  |  270 ++++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/pmpool.h |   20 ++++
>  4 files changed, 298 insertions(+)
>  create mode 100644 drivers/misc/pmpool.c
>  create mode 100644 include/linux/pmpool.h

misc is not for memory pools, as this is not a driver.  please put this
in the properly location instead of trying to hide it from the mm
maintainers and subsystem :)

> diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
> index cadd4a820c03..c8ef5b37ee98 100644
> --- a/drivers/misc/Kconfig
> +++ b/drivers/misc/Kconfig
> @@ -562,6 +562,13 @@ config TPS6594_PFSM
>  	  This driver can also be built as a module.  If so, the module
>  	  will be called tps6594-pfsm.
>  
> +config PMPOOL
> +	bool "Persistent memory pool support"
> +	help
> +	  This option adds support for a persistent memory pool feature, which
> +	  provides pages allocation and freeing from a set of persistent memory ranges,
> +	  deposited to the memory pool.

Why would this even be a user selectable option?  Either the kernel
needs this or it doesn't.


> +
>  source "drivers/misc/c2port/Kconfig"
>  source "drivers/misc/eeprom/Kconfig"
>  source "drivers/misc/cb710/Kconfig"
> diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
> index f2a4d1ff65d4..31dd6553057d 100644
> --- a/drivers/misc/Makefile
> +++ b/drivers/misc/Makefile
> @@ -67,3 +67,4 @@ obj-$(CONFIG_TMR_MANAGER)      += xilinx_tmr_manager.o
>  obj-$(CONFIG_TMR_INJECT)	+= xilinx_tmr_inject.o
>  obj-$(CONFIG_TPS6594_ESM)	+= tps6594-esm.o
>  obj-$(CONFIG_TPS6594_PFSM)	+= tps6594-pfsm.o
> +obj-$(CONFIG_PMPOOL)		+= pmpool.o
> diff --git a/drivers/misc/pmpool.c b/drivers/misc/pmpool.c
> new file mode 100644
> index 000000000000..e2c923b31b36
> --- /dev/null
> +++ b/drivers/misc/pmpool.c
> @@ -0,0 +1,270 @@
> +#include <linux/io.h>

You forgot basic copyright/license stuff, did you use checkpatch.pl on
your file?

> +#include <linux/bitmap.h>
> +#include <linux/memblock.h>
> +#include <linux/spinlock.h>
> +#include <linux/types.h>
> +
> +#include <linux/pmpool.h>
> +
> +#define VERSION			1

In kernel code does not need versions.

> +#define MAX_REGIONS		14

Why 14?  Why not 24?  Or something else?

> +
> +#define for_each_region(pmpool, r)					\
> +	for (r = pmpool->hdr->regions;					\
> +	     r - pmpool->hdr->regions < pmpool->hdr->nr_regions;	\
> +	     r++)
> +
> +#define for_each_used_region(pmpool, r)					\
> +	for_each_region((pmpool), (r))					\
> +		if (!(r)->size_in_pfns) { continue; } else
> +
> +struct pmpool_region {
> +	uint32_t		base_pfn;		/* 32 bits * 4k = up it 15TB of physical address space */

Please use proper kernel types when writing kernel code (i.e. u32, u8,
and so on.)  uint*_t is for userspace code only.

> --- /dev/null
> +++ b/include/linux/pmpool.h
> @@ -0,0 +1,20 @@
> +#ifndef _PMPOOL_H
> +#define _PMPOOL_H
> +
> +#include <linux/types.h>
> +#include <linux/spinlock.h>
> +
> +void *alloc_pm_pages(unsigned int order);
> +void free_pm_pages(void *addr, unsigned int order);
> +
> +struct pmpool {
> +	spinlock_t		lock;
> +	struct pmpool_header	*hdr;
> +};
> +
> +int pmpool_init(struct pmpool *pmpool, phys_addr_t base, phys_addr_t size);
> +
> +void *alloc_pmpool_pages(struct pmpool *pmpool, unsigned int order);
> +void free_pmpool_pages(struct pmpool *pmpool, void *addr, unsigned int order);

Please use "noun_verb_*" for new apis so that we have a chance at
understanding where the calls are living.

good luck!

greg k-h

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ