linux-kernel - [PATCH v3 0/3] x86/hyperv: Mark CoCo VM pages not present when changing encrypted state

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20240105183025.225972-1-mhklinux@outlook.com>
Date: Fri,  5 Jan 2024 10:30:22 -0800
From: mhkelley58@...il.com
To: tglx@...utronix.de,
	mingo@...hat.com,
	bp@...en8.de,
	dave.hansen@...ux.intel.com,
	x86@...nel.org,
	hpa@...or.com,
	kirill.shutemov@...ux.intel.com,
	haiyangz@...rosoft.com,
	wei.liu@...nel.org,
	decui@...rosoft.com,
	luto@...nel.org,
	peterz@...radead.org,
	akpm@...ux-foundation.org,
	urezki@...il.com,
	hch@...radead.org,
	lstoakes@...il.com,
	thomas.lendacky@....com,
	ardb@...nel.org,
	jroedel@...e.de,
	seanjc@...gle.com,
	rick.p.edgecombe@...el.com,
	sathyanarayanan.kuppuswamy@...ux.intel.com,
	linux-kernel@...r.kernel.org,
	linux-coco@...ts.linux.dev,
	linux-hyperv@...r.kernel.org,
	linux-mm@...ck.org
Subject: [PATCH v3 0/3] x86/hyperv: Mark CoCo VM pages not present when changing encrypted state

From: Michael Kelley <mhklinux@...look.com>

In a CoCo VM, when transitioning memory from encrypted to decrypted, or
vice versa, the caller of set_memory_encrypted() or set_memory_decrypted()
is responsible for ensuring the memory isn't in use and isn't referenced
while the transition is in progress.  The transition has multiple steps,
and the memory is in an inconsistent state until all steps are complete.
A reference while the state is inconsistent could result in an exception
that can't be cleanly fixed up.

However, the kernel load_unaligned_zeropad() mechanism could cause a stray
reference that can't be prevented by the caller of set_memory_encrypted()
or set_memory_decrypted(), so there's specific code to handle this case.
But a CoCo VM running on Hyper-V may be configured to run with a paravisor,
with the #VC or #VE exception routed to the paravisor. There's no
architectural way to forward the exceptions back to the guest kernel, and
in such a case, the load_unaligned_zeropad() specific code doesn't work.

To avoid this problem, mark pages as "not present" while a transition
is in progress. If load_unaligned_zeropad() causes a stray reference, a
normal page fault is generated instead of #VC or #VE, and the
page-fault-based fixup handlers for load_unaligned_zeropad() resolve the
reference. When the encrypted/decrypted transition is complete, mark the
pages as "present" again.

This version of the patch series marks transitioning pages "not present"
only when running as a Hyper-V guest with a paravisor. Previous
versions[1] marked transitioning pages "not present" regardless of the
hypervisor and regardless of whether a paravisor is in use.  That more
general use had the benefit of decoupling the load_unaligned_zeropad()
fixup from CoCo VM #VE and #VC exception handling.  But the implementation
was problematic for SEV-SNP because the SEV-SNP hypervisor callbacks
require a valid virtual address, not a physical address like with TDX and
the Hyper-V paravisor.  Marking the transitioning pages "not present"
causes the virtual address to not be valid, and the PVALIDATE
instruction in the SEV-SNP callback fails. Constructing a temporary
virtual address for this purpose is slower and adds complexity that
negates the benefits of the more general use. So this version narrows
the applicability of the approach to just where it is required
because of the #VC and #VE exceptions being routed to a paravisor.

The previous version minimized the TLB flushing done during page
transitions between encrypted and decrypted. Because this version
marks the pages "not present" in hypervisor specific callbacks and
not in __set_memory_enc_pgtable(), doing such optimization is more
difficult to coordinate. But the page transitions are not a hot path,
so this version eschews optimization of TLB flushing in favor of
simplicity.

Since this version no longer touches __set_memory_enc_pgtable(),
I've also removed patches that add comments about error handling
in that function.  Rick Edgecombe has proposed patches to improve
that error handling, and I'll leave those comments to Rick's
patches.

Patch 1 handles implications of the hypervisor callbacks needing
to do virt-to-phys translations on pages that are temporarily
marked not present.

Patch 2 makes the existing set_memory_p() function available for
use in the hypervisor callbacks.

Patch 3 is the core change that marks the transitioning pages
as not present.

This patch set is based on the linux-next20240103 code tree.

Changes in v3:
* Major rework and simplification per discussion above.

Changes in v2:
* Added Patches 3 and 4 to deal with the failure on SEV-SNP
  [Tom Lendacky]
* Split the main change into two separate patches (Patch 5 and
  Patch 6) to improve reviewability and to offer the option of
  retaining both hypervisor callbacks.
* Patch 5 moves set_memory_p() out of an #ifdef CONFIG_X86_64
  so that the code builds correctly for 32-bit, even though it
  is never executed for 32-bit [reported by kernel test robot]

[1] https://lore.kernel.org/lkml/20231121212016.1154303-1-mhklinux@outlook.com/

Michael Kelley (3):
  x86/hyperv: Use slow_virt_to_phys() in page transition hypervisor
    callback
  x86/mm: Regularize set_memory_p() parameters and make non-static
  x86/hyperv: Make encrypted/decrypted changes safe for
    load_unaligned_zeropad()

 arch/x86/hyperv/ivm.c             | 58 ++++++++++++++++++++++++++++---
 arch/x86/include/asm/set_memory.h |  1 +
 arch/x86/mm/pat/set_memory.c      | 25 +++++++------
 3 files changed, 70 insertions(+), 14 deletions(-)

-- 
2.25.1