[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20230512145737.985671-1-bjorn@kernel.org>
Date: Fri, 12 May 2023 16:57:30 +0200
From: Björn Töpel <bjorn@...nel.org>
To: Paul Walmsley <paul.walmsley@...ive.com>,
Palmer Dabbelt <palmer@...belt.com>,
Albert Ou <aou@...s.berkeley.edu>,
linux-riscv@...ts.infradead.org
Cc: Björn Töpel <bjorn@...osinc.com>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
David Hildenbrand <david@...hat.com>,
Oscar Salvador <osalvador@...e.de>,
virtualization@...ts.linux-foundation.org, linux@...osinc.com,
Alexandre Ghiti <alexghiti@...osinc.com>
Subject: [PATCH 0/7] riscv: Memory Hot(Un)Plug support
From: Björn Töpel <bjorn@...osinc.com>
Memory Hot(Un)Plug support for the RISC-V port
==============================================
Introduction
------------
To quote "Documentation/admin-guide/mm/memory-hotplug.rst": "Memory
hot(un)plug allows for increasing and decreasing the size of physical
memory available to a machine at runtime."
This series attempts to add memory hot(un)plug support for the RISC-V
Linux port.
I'm sending the series as a v1, but it's borderline RFC. It definitely
needs more testing time, but it would be nice with some early input.
Implementation
--------------
>From an arch perspective, a couple of callbacks needs to be
implemented to support hot plugging:
arch_add_memory()
This callback is responsible for updating the linear/direct map, and
call into the memory hot plugging generic code via __add_pages().
arch_remove_memory()
In this callback the linear/direct map is tore down.
vmemmap_free()
The function tears down the vmemmap mappings (if
CONFIG_SPARSEMEM_VMEMMAP is in-use), and also deallocates the backing
vmemmap pages. Note that for persistent memory, an alternative
allocator for the backing pages can be used -- the vmem_altmap. This
means that when the backing pages are cleared, extra care is needed so
that the correct deallocation method is used. Note that RISC-V
populates the vmemmap using vmemmap_populate_basepages(), so currently
no hugepages are used for the backing store.
The page table unmap/teardown functions are heavily based (copied!)
from the x86 tree. The same remove_pgd_mapping() is used in both
vmemmap_free() and arch_remove_memory(), but in the latter function
the backing pages are not removed.
On RISC-V, the PGD level kernel mappings needs to synchronized with
all page-tables (e.g. via sync_kernel_mappings()). Synchronization
involves special care, like locking. Instead, this patch series takes
a different approach (introduced by Jörg Rödel in the x86-tree);
Pre-allocate the PGD-leaves (P4D, PUD, or PMD depending on the paging
setup) at mem_init(), for vmemmap and the direct map.
Pre-allocating the PGD-leaves waste some memory, but is only enabled
for CONFIG_MEMORY_HOTPLUG. The number pages, potentially unused, are
~128 * 4K.
Patch 1: Preparation for hotplugging support, by pre-allocating the
PGD leaves.
Patch 2: Changes the __init attribute to __meminit, to avoid that the
functions are removed after init. __meminit keeps the
functions after init, if memory hotplugging is enabled for
the build.
Patch 3: Refactor the direct map setup, so it can be used for hot add.
Patch 4: The actual add/remove code. Mostly a page-table-walk
exercise.
Patch 5: Turn on the arch support in Kconfig
Patch 6: Now that memory hotplugging is enabled, make virtio-mem
usable for RISC-V
Patch 7: Pre-allocate vmalloc PGD-leaves as well, which removes the
need for vmalloc faulting.
RFC
---
* TLB flushes. The current series uses Big Hammer flush-it-all.
* Pre-allocation vs explicit syncs
Testing
-------
ACPI support is still in the making for RISC-V, so tests that involve
CXL and similar fanciness is currently not possible. Virtio-mem,
however, works without proper ACPI support. In order to try this out
in Qemu, some additional patches for Qemu are needed:
* Enable virtio-mem for RISC-V
* Add proper hotplug support for virtio-mem
The patch for Qemu can be found is commit 5d90a7ef1bc0
("hw/riscv/virt: Support for virtio-mem-pci"), and can be found here
https://github.com/bjoto/qemu/tree/riscv-virtio-mem
I will try to upstream that work in parallel with this.
Thanks to David Hildenbrand for valuable input for the Qemu side of
things.
The series is based on the RISC-V fixes tree
https://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git/log/?h=fixes
Thanks,
Björn
Björn Töpel (7):
riscv: mm: Pre-allocate PGD leaves to avoid synchronization
riscv: mm: Change attribute from __init to __meminit for page
functions
riscv: mm: Refactor create_linear_mapping_range() for hot add
riscv: mm: Add memory hot add/remove support
riscv: Enable memory hot add/remove arch kbuild support
virtio-mem: Enable virtio-mem for RISC-V
riscv: mm: Pre-allocate vmalloc PGD leaves
arch/riscv/Kconfig | 2 +
arch/riscv/include/asm/kasan.h | 4 +-
arch/riscv/include/asm/mmu.h | 2 +-
arch/riscv/include/asm/pgtable.h | 2 +-
arch/riscv/mm/fault.c | 7 +-
arch/riscv/mm/init.c | 387 ++++++++++++++++++++++++++++---
drivers/virtio/Kconfig | 2 +-
7 files changed, 364 insertions(+), 42 deletions(-)
base-commit: 3b90b09af5be42491a8a74a549318cfa265b3029
--
2.39.2
Powered by blists - more mailing lists