lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250618112935.7629-1-shivankg@amd.com>
Date: Wed, 18 Jun 2025 11:29:28 +0000
From: Shivank Garg <shivankg@....com>
To: <seanjc@...gle.com>, <david@...hat.com>, <vbabka@...e.cz>,
	<willy@...radead.org>, <akpm@...ux-foundation.org>, <shuah@...nel.org>,
	<pbonzini@...hat.com>, <brauner@...nel.org>, <viro@...iv.linux.org.uk>
CC: <ackerleytng@...gle.com>, <paul@...l-moore.com>, <jmorris@...ei.org>,
	<serge@...lyn.com>, <pvorel@...e.cz>, <bfoster@...hat.com>,
	<tabba@...gle.com>, <vannapurve@...gle.com>, <chao.gao@...el.com>,
	<bharata@....com>, <nikunj@....com>, <michael.day@....com>,
	<yan.y.zhao@...el.com>, <Neeraj.Upadhyay@....com>, <thomas.lendacky@....com>,
	<michael.roth@....com>, <aik@....com>, <jgg@...dia.com>,
	<kalyazin@...zon.com>, <peterx@...hat.com>, <shivankg@....com>,
	<jack@...e.cz>, <rppt@...nel.org>, <hch@...radead.org>,
	<cgzones@...glemail.com>, <ira.weiny@...el.com>, <rientjes@...gle.com>,
	<roypat@...zon.co.uk>, <ziy@...dia.com>, <matthew.brost@...el.com>,
	<joshua.hahnjy@...il.com>, <rakie.kim@...com>, <byungchul@...com>,
	<gourry@...rry.net>, <kent.overstreet@...ux.dev>,
	<ying.huang@...ux.alibaba.com>, <apopple@...dia.com>,
	<chao.p.peng@...el.com>, <amit@...radead.org>, <ddutile@...hat.com>,
	<dan.j.williams@...el.com>, <ashish.kalra@....com>, <gshan@...hat.com>,
	<jgowans@...zon.com>, <pankaj.gupta@....com>, <papaluri@....com>,
	<yuzhao@...gle.com>, <suzuki.poulose@....com>, <quic_eberman@...cinc.com>,
	<aneeshkumar.kizhakeveetil@....com>, <linux-fsdevel@...r.kernel.org>,
	<linux-mm@...ck.org>, <linux-kernel@...r.kernel.org>,
	<linux-security-module@...r.kernel.org>, <kvm@...r.kernel.org>,
	<linux-kselftest@...r.kernel.org>, <linux-coco@...ts.linux.dev>
Subject: [RFC PATCH v8 0/7] Add NUMA mempolicy support for KVM guest-memfd

This series introduces NUMA-aware memory placement support for KVM guests
with guest_memfd memory backends. It builds upon Fuad Tabba's work that
enabled host-mapping for guest_memfd memory [1] and can be applied directly
on KVM tree (branch:queue, base commit:7915077245) [2].

== Background == 
KVM's guest-memfd memory backend currently lacks support for NUMA policy
enforcement, causing guest memory allocations to be distributed across host
nodes  according to kernel's default behavior, irrespective of any policy
specified by the VMM. This limitation arises because conventional userspace
NUMA control mechanisms like mbind(2) don't work since the memory isn't
directly mapped to userspace when allocations occur.
Fuad's work [1] provides the necessary mmap capability, and this series
leverages it to enable mbind(2).

== Implementation ==

This series implements proper NUMA policy support for guest-memfd by:

1. Adding mempolicy-aware allocation APIs to the filemap layer.
2. Introducing custom inodes (via a dedicated slab-allocated inode cache,
   kvm_gmem_inode_info) to store NUMA policy and metadata for guest memory.
3. Implementing get/set_policy vm_ops in guest_memfd to support NUMA
   policy.

With these changes, VMMs can now control guest memory placement by mapping
guest_memfd file descriptor and using mbind(2) to specify:
- Policy modes: default, bind, interleave, or preferred
- Host NUMA nodes: List of target nodes for memory allocation

These Policies affect only future allocations and do not migrate existing
memory. This matches mbind(2)'s default behavior which affects only new
allocations unless overridden with MPOL_MF_MOVE/MPOL_MF_MOVE_ALL flags (Not
supported for guest_memfd as it is unmovable by design).

== Upstream Plan ==
Phased approach as per David's guest_memfd extension overview [3] and
community calls [4]:

Phase 1 (this series):
1. Focuses on shared guest_memfd support (non-CoCo VMs).
2. Builds on Fuad's host-mapping work.

Phase2 (future work):
1. NUMA support for private guest_memfd (CoCo VMs).
2. Depends on SNP in-place conversion support [5].

This series provides a clean integration path for NUMA-aware memory
management for guest_memfd and lays the groundwork for future confidential
computing NUMA capabilities.

Please review and provide feedback!

Thanks,
Shivank

== Changelog ==

- v1,v2: Extended the KVM_CREATE_GUEST_MEMFD IOCTL to pass mempolicy.
- v3: Introduced fbind() syscall for VMM memory-placement configuration.
- v4-v6: Current approach using shared_policy support and vm_ops (based on
         suggestions from David [6] and guest_memfd bi-weekly upstream
         call discussion [7]).
- v7: Use inodes to store NUMA policy instead of file [8].
- v8: Rebase on top of Fuad's V12: Host mmaping for guest_memfd memory.

[1] https://lore.kernel.org/all/20250611133330.1514028-1-tabba@google.com
[2] https://git.kernel.org/pub/scm/virt/kvm/kvm.git/log/?h=queue
[3] https://lore.kernel.org/all/c1c9591d-218a-495c-957b-ba356c8f8e09@redhat.com
[4] https://docs.google.com/document/d/1M6766BzdY1Lhk7LiR5IqVR8B8mG3cr-cxTxOrAosPOk/edit?tab=t.0#heading=h.svcbod20b5ur
[5] https://lore.kernel.org/all/20250613005400.3694904-1-michael.roth@amd.com
[6] https://lore.kernel.org/all/6fbef654-36e2-4be5-906e-2a648a845278@redhat.com
[7] https://lore.kernel.org/all/2b77e055-98ac-43a1-a7ad-9f9065d7f38f@amd.com
[8] https://lore.kernel.org/all/diqzbjumm167.fsf@ackerleytng-ctop.c.googlers.com

Ackerley Tng (1):
  KVM: guest_memfd: Use guest mem inodes instead of anonymous inodes

Shivank Garg (5):
  security: Export anon_inode_make_secure_inode for KVM guest_memfd
  mm/mempolicy: Export memory policy symbols
  KVM: guest_memfd: Add slab-allocated inode cache
  KVM: guest_memfd: Enforce NUMA mempolicy using shared policy
  KVM: guest_memfd: selftests: Add tests for mmap and NUMA policy
    support

Shivansh Dhiman (1):
  mm/filemap: Add mempolicy support to the filemap layer

 fs/anon_inodes.c                              |  20 +-
 include/linux/fs.h                            |   2 +
 include/linux/pagemap.h                       |  41 +++
 include/uapi/linux/magic.h                    |   1 +
 mm/filemap.c                                  |  27 +-
 mm/mempolicy.c                                |   6 +
 tools/testing/selftests/kvm/Makefile.kvm      |   1 +
 .../testing/selftests/kvm/guest_memfd_test.c  | 123 ++++++++-
 virt/kvm/guest_memfd.c                        | 254 ++++++++++++++++--
 virt/kvm/kvm_main.c                           |   7 +-
 virt/kvm/kvm_mm.h                             |  10 +-
 11 files changed, 456 insertions(+), 36 deletions(-)

-- 
2.43.0
---
== Earlier Postings ==
v7: https://lore.kernel.org/all/20250408112402.181574-1-shivankg@amd.com
v6: https://lore.kernel.org/all/20250226082549.6034-1-shivankg@amd.com
v5: https://lore.kernel.org/all/20250219101559.414878-1-shivankg@amd.com
v4: https://lore.kernel.org/all/20250210063227.41125-1-shivankg@amd.com
v3: https://lore.kernel.org/all/20241105164549.154700-1-shivankg@amd.com
v2: https://lore.kernel.org/all/20240919094438.10987-1-shivankg@amd.com
v1: https://lore.kernel.org/all/20240916165743.201087-1-shivankg@amd.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ