linux-kernel - Re: [PATCH 1/1] x86/tdx: Route safe halt execution via tdx_safe

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Z5ozZOplEQZLHb1g@google.com>
Date: Wed, 29 Jan 2025 05:55:48 -0800
From: Sean Christopherson <seanjc@...gle.com>
To: "Kirill A. Shutemov" <kirill@...temov.name>
Cc: Vishal Annapurve <vannapurve@...gle.com>, Dave Hansen <dave.hansen@...el.com>, x86@...nel.org, 
	linux-kernel@...r.kernel.org, pbonzini@...hat.com, erdemaktas@...gle.com, 
	ackerleytng@...gle.com, jxgao@...gle.com, sagis@...gle.com, oupton@...gle.com, 
	pgonda@...gle.com, dave.hansen@...ux.intel.com, linux-coco@...ts.linux.dev, 
	chao.p.peng@...ux.intel.com, isaku.yamahata@...il.com
Subject: Re: [PATCH 1/1] x86/tdx: Route safe halt execution via tdx_safe_halt

On Wed, Jan 29, 2025, Kirill A. Shutemov wrote:
> On Tue, Jan 28, 2025 at 04:45:35PM -0800, Sean Christopherson wrote:
> > This incorrectly assumes the hypervisor is intercepting HLT.  If the VM is given
> > a slice of hardware, HLT-exiting may be disabled, in which case it's desirable
> > for the guest to natively execute HLT, as the latencies to get in and out of "HLT"
> > are lower, especially for TDX guests.  Such a VM would hopefully have MONITOR/MWAIT
> > available as well, but even if that were the case, the admin could select HLT for
> > idling.
> > 
> > Ugh, and I see that bfe6ed0c6727 ("x86/tdx: Add HLT support for TDX guests")
> > overrides default_idle().  The kernel really shouldn't do that, because odds are
> > decent that any TDX guest will have direct access to HLT.  The best approach I
> > can think of would be to patch x86_idle() to tdx_safe_halt() if and only if a HLT
> > #VE is taken.  The tricky part would be delaying the update until it's safe to do
> > so.
> 
> I am confused. HLT triggers #VE unconditionally in TDX guests. How would
> TDX guest have direct access to HLT?

Gah, you're not confused, I am.  I was thinking of the SEV-ES model where intercepts
are morphed to #VC.  

> Even if it would in the future, it is going to explicit opt-in from the
> guest and we can avoid setting x86_idle() for such cases.

Or explicitly enumeration from the TDX module.

> > As for taking a #VE, the exception itself is fine (assuming the kernel isn't off
> > the rails and using a trap gate :-D).  The issue is likely that RFLAGS.IF=1 on
> > the stack, and so the call to cond_local_irq_enable() enables IRQs before making
> > the hypercall.  E.g. no one has complained about #VC, because exc_vmm_communication()
> > doesn't enable IRQs.
> > 
> > Off the top of my head, I can't think of any flows that would do HLT with IRQs
> > fully enabled.  Even PV spinlocks use safe_halt(), e.g. in kvm_wait(), so I don't
> > think there's any value in trying to precisely identify that it's a safe HLT?
> 
> I can only think of "CPU is dead" use-case of HLT where interrupts are
> enabled. But I hate special-casing HLT in exc_virtualization_exception() :/

Ignore me, overriding at boot time is the way to go. 

> > E.g. this should fix the immediate problem, and then ideally someone would make
> > TDX guests play nice with native HLT.
> 
> I've asked (some time ago) TDX module folks to provide interruptibility
> state as part of the guest so we can handle STI shadow properly, not as a
> hack around HLT.
> 
> The immediate problem can be addressed by fixing the BIOS to not advertise
> C-states (if I read the situation right).

No, something like Vishal proposed is a better fix.  It's still desirable for the
vCPU to call out to the hypervisor when going idle, otherwise a vCPU that is idle
for an extended duration will never let the pCPU go idle.