lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250828093902.2719-1-roypat@amazon.co.uk>
Date: Thu, 28 Aug 2025 09:39:14 +0000
From: "Roy, Patrick" <roypat@...zon.co.uk>
To: "david@...hat.com" <david@...hat.com>, "seanjc@...gle.com"
	<seanjc@...gle.com>
CC: "Roy, Patrick" <roypat@...zon.co.uk>, "tabba@...gle.com"
	<tabba@...gle.com>, "ackerleytng@...gle.com" <ackerleytng@...gle.com>,
	"pbonzini@...hat.com" <pbonzini@...hat.com>, "kvm@...r.kernel.org"
	<kvm@...r.kernel.org>, "linux-arm-kernel@...ts.infradead.org"
	<linux-arm-kernel@...ts.infradead.org>, "kvmarm@...ts.linux.dev"
	<kvmarm@...ts.linux.dev>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "linux-mm@...ck.org" <linux-mm@...ck.org>,
	"rppt@...nel.org" <rppt@...nel.org>, "will@...nel.org" <will@...nel.org>,
	"vbabka@...e.cz" <vbabka@...e.cz>, "Cali, Marco" <xmarcalx@...zon.co.uk>,
	"Kalyazin, Nikita" <kalyazin@...zon.co.uk>, "Thomson, Jack"
	<jackabt@...zon.co.uk>, "Manwaring, Derek" <derekmn@...zon.com>
Subject: [PATCH v5 00/12] Direct Map Removal Support for guest_memfd

[ based on kvm/next ]

Unmapping virtual machine guest memory from the host kernel's direct map is a
successful mitigation against Spectre-style transient execution issues: If the
kernel page tables do not contain entries pointing to guest memory, then any
attempted speculative read through the direct map will necessarily be blocked
by the MMU before any observable microarchitectural side-effects happen. This
means that Spectre-gadgets and similar cannot be used to target virtual machine
memory. Roughly 60% of speculative execution issues fall into this category [1,
Table 1].

This patch series extends guest_memfd with the ability to remove its memory
from the host kernel's direct map, to be able to attain the above protection
for KVM guests running inside guest_memfd.

=== Design ===

We build on top of guest_memfd's recent support for "non-confidential VMs", in
which all of guest_memfd is mappable to userspace (e.g. considered "shared").
For such VMs, all guest page faults are routed through guest_memfd's special
page fault handler, which due to consuming fd+offset directly, can map direct
map removed memory into the guest. KVM's internal accesses to guest memory are
handled by providing each memslot with a userspace mapping of that memslots
guest_memfd via userspace_addr. Since KVM's internal accesses are almost
exclusively handled via copy_from_user() and friends, this allows KVM to access
direct map removed guest memory for features such as MMIO instruction emulation
on x86 or pvtime support on ARM64.

=== Implementation ===

The KVM_CREATE_GUEST_MEMFD ioctl gains a new flag
GUEST_MEMFD_FLAG_NO_DIRECT_MAP.  If this flag is passed, then guest_memfd
removes direct map entries for its folios are preparation. Upon free-ing of the
memory, direct map entries are restored prior to gmem's arch specific
invalidation callback.

Support for the flag can be discovered via the KVM_CAP_GMEM_NO_DIRECT_MAP
capability, which is only available if direct map modifications at 4k
granularity is architecturally possible / when KVM can successfully map direct
map removed memory into the guest.

=== Testing ===

KVM selftests are extended to cover the above-described non-CoCo workflows,
where guest_memfd with direct map entries removed is used to back all of guest
memory, and exercising some simple MMIO paths.

Additionally, a Firecracker branch with support for these VMs can be found on
GitHub [2].

=== Changes since v4 ===

- Rebase on top of kvm/next
- Stop using PG_private to track direct map removal state
- fix build or KVM-as-a-module by using new EXPORT_SYMBOL_FOR_MODULES

=== FAQ ===

--- why not reuse memfd_secret() / a bespoke guest memory solution? ---

having guest memory be direct map removed means guest page faults cannot be
resolved by GUP-ing userspace mappings of guest memory, as GUP is disabled for
direct map removed memory (as currently GUP has no way to understand that a
specific GUP request will not subsequently dereference page_address()).
guest_memfd already has a special path inside KVM that instead consumed
fd+offset, so it makes sense to reuse this. Additionally, it means that
direct-map-removed VMs can benefit from active development on guest_memfd, such
as huge pages support.

--- why do KVM internal accesses through userspace page tables? ---

For traditional VMs, all KVM internal accesses are done through the
userspace_addr stored in a memslot, meaning no changes to most KVM code are
needed just to allow access to guest_memfd backed / direct map removed guest
memory of non-confidential VMs. Previous iterations of this series tried to
avoid userspace mappings, instead attempting to dynamically restore direct map
entries for internal accesses [RFCv2], but this turned out to have a
significant performance impact, as well as additional complexity due to needing
to refcount direct map reinsertion operations and making them play nicely with
gmem truncations.

--- what doesn't work with direct map removed VMs? ---

The only thing I'm aware of is kvm-clock, since it tries to GUP guest memory
via gfn_to_pfn_cache. Realistically, this is only a problem on AMD, as on Intel
guests can use TSC as a clocksource (Intel allows discovery of TSC frequency
via CPUID, while AMD doesn't).  AMD guests fall back onto some calibration
routine, which fails most of the time though.

[1]: https://download.vusec.net/papers/quarantine_raid23.pdf
[2]: https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding
[RFCv1]: https://lore.kernel.org/kvm/20240709132041.3625501-1-roypat@amazon.co.uk/
[RFCv2]: https://lore.kernel.org/kvm/20240910163038.1298452-1-roypat@amazon.co.uk/
[RFCv3]: https://lore.kernel.org/kvm/20241030134912.515725-1-roypat@amazon.co.uk/
[v4]: https://lore.kernel.org/kvm/20250221160728.1584559-1-roypat@amazon.co.uk/


Elliot Berman (1):
  filemap: Pass address_space mapping to ->free_folio()

Patrick Roy (11):
  arch: export set_direct_map_valid_noflush to KVM module
  mm: introduce AS_NO_DIRECT_MAP
  KVM: guest_memfd: Add flag to remove from direct map
  KVM: Documentation: describe GUEST_MEMFD_FLAG_NO_DIRECT_MAP
  KVM: selftests: load elf via bounce buffer
  KVM: selftests: set KVM_MEM_GUEST_MEMFD in vm_mem_add() if guest_memfd
    != -1
  KVM: selftests: Add guest_memfd based vm_mem_backing_src_types
  KVM: selftests: stuff vm_mem_backing_src_type into vm_shape
  KVM: selftests: cover GUEST_MEMFD_FLAG_NO_DIRECT_MAP in mem conversion
    tests
  KVM: selftests: cover GUEST_MEMFD_FLAG_NO_DIRECT_MAP in
    guest_memfd_test.c
  KVM: selftests: Test guest execution from direct map removed gmem

 Documentation/filesystems/locking.rst         |  2 +-
 Documentation/virt/kvm/api.rst                |  5 ++
 arch/arm64/include/asm/kvm_host.h             | 12 ++++
 arch/arm64/mm/pageattr.c                      |  1 +
 arch/loongarch/mm/pageattr.c                  |  1 +
 arch/riscv/mm/pageattr.c                      |  1 +
 arch/s390/mm/pageattr.c                       |  1 +
 arch/x86/mm/pat/set_memory.c                  |  1 +
 fs/nfs/dir.c                                  | 11 ++--
 fs/orangefs/inode.c                           |  3 +-
 include/linux/fs.h                            |  2 +-
 include/linux/kvm_host.h                      |  7 +++
 include/linux/pagemap.h                       | 16 +++++
 include/linux/secretmem.h                     | 18 ------
 include/uapi/linux/kvm.h                      |  2 +
 lib/buildid.c                                 |  4 +-
 mm/filemap.c                                  |  9 +--
 mm/gup.c                                      | 14 +----
 mm/mlock.c                                    |  2 +-
 mm/secretmem.c                                |  9 +--
 mm/vmscan.c                                   |  4 +-
 .../testing/selftests/kvm/guest_memfd_test.c  |  2 +
 .../testing/selftests/kvm/include/kvm_util.h  | 37 ++++++++---
 .../testing/selftests/kvm/include/test_util.h |  8 +++
 tools/testing/selftests/kvm/lib/elf.c         |  8 +--
 tools/testing/selftests/kvm/lib/io.c          | 23 +++++++
 tools/testing/selftests/kvm/lib/kvm_util.c    | 61 +++++++++++--------
 tools/testing/selftests/kvm/lib/test_util.c   |  8 +++
 tools/testing/selftests/kvm/lib/x86/sev.c     |  1 +
 .../selftests/kvm/pre_fault_memory_test.c     |  1 +
 .../selftests/kvm/set_memory_region_test.c    | 50 +++++++++++++--
 .../kvm/x86/private_mem_conversions_test.c    |  7 ++-
 virt/kvm/guest_memfd.c                        | 32 ++++++++--
 virt/kvm/kvm_main.c                           |  5 ++
 34 files changed, 264 insertions(+), 104 deletions(-)


base-commit: a6ad54137af92535cfe32e19e5f3bc1bb7dbd383
-- 
2.50.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ