[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <2023082506-enchanted-tripping-d1d5@gregkh>
Date: Fri, 25 Aug 2023 10:05:08 +0200
From: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To: Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com>
Cc: Stanislav Kinsburskii <stanislav.kinsburskii@...il.com>,
Derek Kiernan <derek.kiernan@....com>,
Dragan Cvetic <dragan.cvetic@....com>,
Arnd Bergmann <arnd@...db.de>, Wei Liu <wei.liu@...nel.org>,
"K. Y. Srinivasan" <kys@...rosoft.com>,
Madhavan Venkataraman <madvenka@...ux.microsoft.com>,
Anthony Yznaga <anthony.yznaga@...cle.com>,
"Mike Rapoport (IBM)" <rppt@...nel.org>,
James Gowans <jgowans@...zon.com>,
Anirudh Rayabharam <anrayabh@...ux.microsoft.com>,
Jinank Jain <jinankjain@...ux.microsoft.com>,
linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] Introduce persistent memory pool
On Tue, Aug 22, 2023 at 11:34:34AM -0700, Stanislav Kinsburskii wrote:
> This patch addresses the need for a memory allocator dedicated to
> persistent memory within the kernel. This allocator will preserve
> kernel-specific states like DMA passthrough device states, IOMMU state, and
> more across kexec.
And CMA doesn't do this for you today?
> The proposed solution offers a foundational implementation for potential
> custom solutions that might follow. Though the implementation is
> intentionally kept concise and straightforward to foster discussion and
> feedback, it's fully functional in its current state.
>
> The persistent memory pool consists of a simple page allocator that
> operates on reserved physical memory regions. It employs bitmaps to
> allocate and free pages, with the following characteristics:
>
> 1. Memory pool regions are specified using the command line option:
>
> pmpool=<base>,<size>
>
> Where <base> represents the physical memory base address and <size>
> indicates the memory size.
>
> 2. While the pages allocation emulates the buddy allocator interface, it
> isn't confined to the MAX_ORDER.
>
> 3. The memory pool initializes during a cold boot and is retained across
> kexec.
>
> Potential applications include:
>
> 1. Allowing various in-kernel entities to allocate persistent pages from
> a singular memory pool, eliminating the need for multiple region
> reservations.
>
> 2. For in-kernel components that require the allocation address to be
> available on kernel kexec, this address can be exposed to user space and
> then passed via the command line.
>
> 3. Separate subsystems or drivers can reserve their region, sharing a
> portion of it for their persistent memory pool. This might be a file
> system, a key-value store, or other applications.
>
> Potential Enhancements for the Proposed Memory Pool:
>
> 1. Multiple Memory Regions Support: enhance the pool to accommodate and
> manage multiple memory regions simultaneously.
>
> 2. In-Kernel Memory Allocations for Storage Memory Regions:
> * Allow in-kernel memory allocations to serve as storage memory regions.
> * Implement explicit reservation of these designated regions during kexec.
As you have no in-kernel users of this, it's not something we can even
consider at the moment for obvious reasons (neither would you want us
to.)
Can you make this part of a patch series that actually adds a user,
probably more than one, so that we can see if any of this even makes
sense?
> drivers/misc/Kconfig | 7 +
> drivers/misc/Makefile | 1
> drivers/misc/pmpool.c | 270 ++++++++++++++++++++++++++++++++++++++++++++++++
> include/linux/pmpool.h | 20 ++++
> 4 files changed, 298 insertions(+)
> create mode 100644 drivers/misc/pmpool.c
> create mode 100644 include/linux/pmpool.h
misc is not for memory pools, as this is not a driver. please put this
in the properly location instead of trying to hide it from the mm
maintainers and subsystem :)
> diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
> index cadd4a820c03..c8ef5b37ee98 100644
> --- a/drivers/misc/Kconfig
> +++ b/drivers/misc/Kconfig
> @@ -562,6 +562,13 @@ config TPS6594_PFSM
> This driver can also be built as a module. If so, the module
> will be called tps6594-pfsm.
>
> +config PMPOOL
> + bool "Persistent memory pool support"
> + help
> + This option adds support for a persistent memory pool feature, which
> + provides pages allocation and freeing from a set of persistent memory ranges,
> + deposited to the memory pool.
Why would this even be a user selectable option? Either the kernel
needs this or it doesn't.
> +
> source "drivers/misc/c2port/Kconfig"
> source "drivers/misc/eeprom/Kconfig"
> source "drivers/misc/cb710/Kconfig"
> diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
> index f2a4d1ff65d4..31dd6553057d 100644
> --- a/drivers/misc/Makefile
> +++ b/drivers/misc/Makefile
> @@ -67,3 +67,4 @@ obj-$(CONFIG_TMR_MANAGER) += xilinx_tmr_manager.o
> obj-$(CONFIG_TMR_INJECT) += xilinx_tmr_inject.o
> obj-$(CONFIG_TPS6594_ESM) += tps6594-esm.o
> obj-$(CONFIG_TPS6594_PFSM) += tps6594-pfsm.o
> +obj-$(CONFIG_PMPOOL) += pmpool.o
> diff --git a/drivers/misc/pmpool.c b/drivers/misc/pmpool.c
> new file mode 100644
> index 000000000000..e2c923b31b36
> --- /dev/null
> +++ b/drivers/misc/pmpool.c
> @@ -0,0 +1,270 @@
> +#include <linux/io.h>
You forgot basic copyright/license stuff, did you use checkpatch.pl on
your file?
> +#include <linux/bitmap.h>
> +#include <linux/memblock.h>
> +#include <linux/spinlock.h>
> +#include <linux/types.h>
> +
> +#include <linux/pmpool.h>
> +
> +#define VERSION 1
In kernel code does not need versions.
> +#define MAX_REGIONS 14
Why 14? Why not 24? Or something else?
> +
> +#define for_each_region(pmpool, r) \
> + for (r = pmpool->hdr->regions; \
> + r - pmpool->hdr->regions < pmpool->hdr->nr_regions; \
> + r++)
> +
> +#define for_each_used_region(pmpool, r) \
> + for_each_region((pmpool), (r)) \
> + if (!(r)->size_in_pfns) { continue; } else
> +
> +struct pmpool_region {
> + uint32_t base_pfn; /* 32 bits * 4k = up it 15TB of physical address space */
Please use proper kernel types when writing kernel code (i.e. u32, u8,
and so on.) uint*_t is for userspace code only.
> --- /dev/null
> +++ b/include/linux/pmpool.h
> @@ -0,0 +1,20 @@
> +#ifndef _PMPOOL_H
> +#define _PMPOOL_H
> +
> +#include <linux/types.h>
> +#include <linux/spinlock.h>
> +
> +void *alloc_pm_pages(unsigned int order);
> +void free_pm_pages(void *addr, unsigned int order);
> +
> +struct pmpool {
> + spinlock_t lock;
> + struct pmpool_header *hdr;
> +};
> +
> +int pmpool_init(struct pmpool *pmpool, phys_addr_t base, phys_addr_t size);
> +
> +void *alloc_pmpool_pages(struct pmpool *pmpool, unsigned int order);
> +void free_pmpool_pages(struct pmpool *pmpool, void *addr, unsigned int order);
Please use "noun_verb_*" for new apis so that we have a chance at
understanding where the calls are living.
good luck!
greg k-h
Powered by blists - more mailing lists