[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z5ozZOplEQZLHb1g@google.com>
Date: Wed, 29 Jan 2025 05:55:48 -0800
From: Sean Christopherson <seanjc@...gle.com>
To: "Kirill A. Shutemov" <kirill@...temov.name>
Cc: Vishal Annapurve <vannapurve@...gle.com>, Dave Hansen <dave.hansen@...el.com>, x86@...nel.org,
linux-kernel@...r.kernel.org, pbonzini@...hat.com, erdemaktas@...gle.com,
ackerleytng@...gle.com, jxgao@...gle.com, sagis@...gle.com, oupton@...gle.com,
pgonda@...gle.com, dave.hansen@...ux.intel.com, linux-coco@...ts.linux.dev,
chao.p.peng@...ux.intel.com, isaku.yamahata@...il.com
Subject: Re: [PATCH 1/1] x86/tdx: Route safe halt execution via tdx_safe_halt
On Wed, Jan 29, 2025, Kirill A. Shutemov wrote:
> On Tue, Jan 28, 2025 at 04:45:35PM -0800, Sean Christopherson wrote:
> > This incorrectly assumes the hypervisor is intercepting HLT. If the VM is given
> > a slice of hardware, HLT-exiting may be disabled, in which case it's desirable
> > for the guest to natively execute HLT, as the latencies to get in and out of "HLT"
> > are lower, especially for TDX guests. Such a VM would hopefully have MONITOR/MWAIT
> > available as well, but even if that were the case, the admin could select HLT for
> > idling.
> >
> > Ugh, and I see that bfe6ed0c6727 ("x86/tdx: Add HLT support for TDX guests")
> > overrides default_idle(). The kernel really shouldn't do that, because odds are
> > decent that any TDX guest will have direct access to HLT. The best approach I
> > can think of would be to patch x86_idle() to tdx_safe_halt() if and only if a HLT
> > #VE is taken. The tricky part would be delaying the update until it's safe to do
> > so.
>
> I am confused. HLT triggers #VE unconditionally in TDX guests. How would
> TDX guest have direct access to HLT?
Gah, you're not confused, I am. I was thinking of the SEV-ES model where intercepts
are morphed to #VC.
> Even if it would in the future, it is going to explicit opt-in from the
> guest and we can avoid setting x86_idle() for such cases.
Or explicitly enumeration from the TDX module.
> > As for taking a #VE, the exception itself is fine (assuming the kernel isn't off
> > the rails and using a trap gate :-D). The issue is likely that RFLAGS.IF=1 on
> > the stack, and so the call to cond_local_irq_enable() enables IRQs before making
> > the hypercall. E.g. no one has complained about #VC, because exc_vmm_communication()
> > doesn't enable IRQs.
> >
> > Off the top of my head, I can't think of any flows that would do HLT with IRQs
> > fully enabled. Even PV spinlocks use safe_halt(), e.g. in kvm_wait(), so I don't
> > think there's any value in trying to precisely identify that it's a safe HLT?
>
> I can only think of "CPU is dead" use-case of HLT where interrupts are
> enabled. But I hate special-casing HLT in exc_virtualization_exception() :/
Ignore me, overriding at boot time is the way to go.
> > E.g. this should fix the immediate problem, and then ideally someone would make
> > TDX guests play nice with native HLT.
>
> I've asked (some time ago) TDX module folks to provide interruptibility
> state as part of the guest so we can handle STI shadow properly, not as a
> hack around HLT.
>
> The immediate problem can be addressed by fixing the BIOS to not advertise
> C-states (if I read the situation right).
No, something like Vishal proposed is a better fix. It's still desirable for the
vCPU to call out to the hypervisor when going idle, otherwise a vCPU that is idle
for an extended duration will never let the pCPU go idle.
Powered by blists - more mailing lists