lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 24 May 2024 22:44:09 +0000
From: Michael Kelley <mhklinux@...look.com>
To: Dexuan Cui <decui@...rosoft.com>, Dave Hansen <dave.hansen@...el.com>,
	"x86@...nel.org" <x86@...nel.org>, "linux-coco@...ts.linux.dev"
	<linux-coco@...ts.linux.dev>, "bp@...en8.de" <bp@...en8.de>,
	"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>, Haiyang Zhang
	<haiyangz@...rosoft.com>, "hpa@...or.com" <hpa@...or.com>,
	"kirill.shutemov@...ux.intel.com" <kirill.shutemov@...ux.intel.com>, KY
 Srinivasan <kys@...rosoft.com>, "luto@...nel.org" <luto@...nel.org>,
	"mingo@...hat.com" <mingo@...hat.com>, "peterz@...radead.org"
	<peterz@...radead.org>, "sathyanarayanan.kuppuswamy@...ux.intel.com"
	<sathyanarayanan.kuppuswamy@...ux.intel.com>, "tglx@...utronix.de"
	<tglx@...utronix.de>, "wei.liu@...nel.org" <wei.liu@...nel.org>, jason
	<jason@...c4.com>, "thomas.lendacky@....com" <thomas.lendacky@....com>,
	"tytso@....edu" <tytso@....edu>, "ardb@...nel.org" <ardb@...nel.org>
CC: "linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Tianyu Lan
	<Tianyu.Lan@...rosoft.com>
Subject: RE: [RFC PATCH] clocksource: hyper-v: Enable the tsc_page for a TDX
 VM in TD mode

From: Dexuan Cui <decui@...rosoft.com> Sent: Friday, May 24, 2024 1:46 AM
> 
> > From: Dave Hansen <dave.hansen@...el.com>
> > Sent: Thursday, May 23, 2024 7:26 AM
> > [...]
> > On 5/22/24 19:24, Dexuan Cui wrote:
> > ...
> > > +static bool noinstr intel_cc_platform_td_l2(enum cc_attr attr)
> > > +{
> > > +	switch (attr) {
> > > +	case CC_ATTR_GUEST_MEM_ENCRYPT:
> > > +	case CC_ATTR_MEM_ENCRYPT:
> > > +		return true;
> > > +	default:
> > > +		return false;
> > > +	}
> > > +}
> > > +
> > >  static bool noinstr intel_cc_platform_has(enum cc_attr attr)
> > >  {
> > > +	if (tdx_partitioned_td_l2)
> > > +		return intel_cc_platform_td_l2(attr);
> > > +
> > >  	switch (attr) {
> > >  	case CC_ATTR_GUEST_UNROLL_STRING_IO:
> > >  	case CC_ATTR_HOTPLUG_DISABLED:
> >
> > On its face, this _looks_ rather troubling.  It just hijacks all of the
> > attributes.  It totally bifurcates the code.  Anything that gets added
> > to intel_cc_platform_has() now needs to be considered for addition to
> > intel_cc_platform_td_l2().
> 
> Maybe the bifurcation is necessary? TD mode is different from
> Partitioned TD mode (L2), after all. Another reason for the bifurcation
> is:  currently online/offline'ing is disallowed for a TD VM, but actually
> Hyper-V is able to support CPU online/offline'ing for a TD VM in
> Partitioned TD mode (L2) -- how can we allow online/offline'ing for such
> a VM?
> 
> BTW, the bifurcation code is copied from amd_cc_platform_has(), where
> an AMD SNP VM may run in the vTOM mode.
> 
> > > --- a/arch/x86/mm/mem_encrypt_amd.c
> > > +++ b/arch/x86/mm/mem_encrypt_amd.c
> > ...
> > > @@ -529,7 +530,7 @@ void __init mem_encrypt_free_decrypted_mem(void)
> > >  	 * CC_ATTR_MEM_ENCRYPT, aren't necessarily equivalent in a Hyper-V VM
> > >  	 * using vTOM, where sme_me_mask is always zero.
> > >  	 */
> > > -	if (sme_me_mask) {
> > > +	if (sme_me_mask || (cc_vendor == CC_VENDOR_INTEL && !tdx_partitioned_td_l2)) {

FWIW, the above won't work in a kernel built with CONFIG_TDX_GUEST=y
but CONFIG_AMD_MEM_ENCRYPT=n. mem_encrypt_free_decrypted_mem()
in arch/x86/mm/mem_encrypt_amd.c won't get built, and an empty stub is used.

> > >  		r = set_memory_encrypted(vaddr, npages);
> > >  		if (r) {
> > >  			pr_warn("failed to free unused decrypted pages\n");
> >
> > If _ever_ there were a place for a new CC_ attribute, this would be it.
> Not sure how to add a new CC attribute for the __bss_decrypted support.
> 
> For the cpu online/offline'ing support, I'm not sure how to add a new
> CC attribute and not introduce the bifurcation.
> 
> > It's also a bit concerning that now we've got a (cc_vendor ==
> > CC_VENDOR_INTEL) check in an amd.c file.
> I agree my change here is ugly...
> Currently the __bss_decrypted support is only used for SNP.
> Not sure if we should get it to work for TDX as well.
> 
> > So all of that on top of Kirill's "why do we need this in the first
> > place" questions leave me really scratching my head on this one.
> Probably I'll just use local APIC timer in such a VM or delay enabling
> Hyper-V TSC page to a later place where set_memory_decrypted()
> works for me. However, I still would like to find out how to allow
> CPU online/offline'ing for a TDX VM in Partitioned TD mode (L2).
> 

My thoughts:

__bss_decrypted is named as if it applies to any CoCo VM, but really
it is specific to AMD SEV. It was originally used for a GHCB page, which
is SEV-specific, and then it proved to be convenient for the Hyper-V TSC
page. Ideally, we could fix __bss_decrypted to work generally in a
TDX VM without any dependency on code specific to a hypervisor. But
looking at some of the details, that may be non-trivial.

A narrower solution is to remove the Hyper-V TSC page from
__bss_decrypted, and use Hyper-V specific code on both TDX and
SEV-SNP to decrypt just that page (not the entire __bss_decrypted), 
based on whether the Hyper-V guest is running with a paravisor.
>From Dexuan's patch, it looks like set_memory_decrypted()
works on TDX at the time that ms_hyperv_init_platform() runs.
Does it also work on SEV-SNP? The code in kvm_init_platform()
uses early_set_mem_enc_dec_hypercall() with
kvm_sev_hc_page_enc_status(), which is SEV only.  So maybe
the normal set_memory_decrypted() doesn't work on SEV at
that point, though I'm not at all clear on what kvm_init_platform is
trying to do.  Shouldn't __bss_decrypted already be set up correctly?

The issue of taking CPUs offline is separate. Is the inability to take
a CPU offline with TDX an architectural limitation? Or just a
current Linux implementation limitation? And what about in an
L2 TDX VM?  If the existence of a limitation in a L2 TDX VM is
dependent on the hypervisor/paravisor, then can cc_platform_has()
check some architectural flag (that's independent of the host
hypervisor) to know if it is running in an L2 TDX VM and return false
for CC_ATTR_HOTPLUG_DISABLED? If a host/paravisor combo doesn't
allow taking a L2 TDX VM CPU offline, then it would be up to that
combo to implement the appropriate restriction. It's not hard to add
a CPUHP state that would prevent it.

Michael


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ