[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20230907075453.350554-1-gregory.price@memverge.com>
Date: Thu, 7 Sep 2023 03:54:50 -0400
From: Gregory Price <gourry.memverge@...il.com>
To: linux-mm@...r.kernel.org
Cc: linux-kernel@...r.kernel.org, linux-arch@...r.kernel.org,
linux-api@...r.kernel.org, linux-cxl@...r.kernel.org,
luto@...nel.org, tglx@...utronix.de, mingo@...hat.com,
bp@...en8.de, dave.hansen@...ux.intel.com, hpa@...or.com,
arnd@...db.de, akpm@...ux-foundation.org, x86@...nel.org,
Gregory Price <gregory.price@...verge.com>
Subject: [RFC 0/3] sys_move_phy_pages system call
This patch set is a proposal for a syscall analogous to move_pages,
that migrates pages between NUMA nodes using physical addressing.
The intent is to better enable user-land system-wide memory tiering
as CXL devices begin to provide memory resources on the PCIe bus.
This patch set broken into 3 patches:
1 & 2) A small refactor of existing migration code for the purpose
of reusing that code
3) The sys_move_phys_pages system call.
The sys_move_phys_pages system call validates the page may be
migrated by checking migratable-status of each vma mapping the page,
and the intersection of cpuset policies each vma's task.
Background:
Userspace job schedulers, memory managers, and tiering software
solutions depend on page migration syscalls to reallocate resources
across NUMA nodes. Currently, these calls enable movement of memory
associated with a specific PID. Moves can be requested in coarse,
process-sized strokes (as with migrate_pages), and on specific virtual
pages (via move_pages).
However, a number of profiling mechanisms provide system-wide information
that would benefit from a physical-addressing version move_pages.
- the IDLE bit is cleared on reads/writes of physical pages
- /proc/zoneinfo breaks PFN-space into NUMA nodes
- PEBS/IBS can track frequency of accesses to physical addresses.
- Devices themselves may provide hotness information about its memory
resources, either with physical or device addressing.
Information from these sources facilitates systemwide resource management,
but with the limitations of migrate_pages and move_pages applying to
individual tasks, their outputs must be converted back to virtual addresses
and re-associated with specific PIDs.
Doing this reverse-translation outside of the kernel requires considerable
space and compute, and it will have to be performed again by the existing
system calls. Much of this work can be avoided if the pages can be
migrated directly with physical memory addressing.
Gregory Price (3):
mm/migrate: remove unused mm argument from do_move_pages_to_node
mm/migrate: refactor add_page_for_migration for code re-use
mm/migrate: Create move_phys_pages syscall
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
include/linux/syscalls.h | 5 +
include/uapi/asm-generic/unistd.h | 8 +-
kernel/sys_ni.c | 1 +
mm/migrate.c | 268 ++++++++++++++++++++----
tools/include/uapi/asm-generic/unistd.h | 8 +-
7 files changed, 248 insertions(+), 44 deletions(-)
--
2.39.1
Powered by blists - more mailing lists