linux-kernel - Re: CONFIG_X86_HYPERVISOR (was: Re: [PATCH AUTOSEL 5.10 2/6] x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250423184326.GCaAk0zinljkNHa_M7@renoirsky.local>
Date: Wed, 23 Apr 2025 20:43:26 +0200
From: Borislav Petkov <bp@...en8.de>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Pavel Machek <pavel@...x.de>, Sasha Levin <sashal@...nel.org>,
	linux-kernel@...r.kernel.org, stable@...r.kernel.org,
	Max Grobecker <max@...becker.info>, Ingo Molnar <mingo@...nel.org>,
	tglx@...utronix.de, mingo@...hat.com, dave.hansen@...ux.intel.com,
	x86@...nel.org, thomas.lendacky@....com, perry.yuan@....com,
	mario.limonciello@....com, riel@...riel.com, mjguzik@...il.com,
	darwi@...utronix.de, Paolo Bonzini <pbonzini@...hat.com>
Subject: Re: CONFIG_X86_HYPERVISOR (was: Re: [PATCH AUTOSEL 5.10 2/6]
 x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when
 running in a virtual machine)

On Wed, Apr 23, 2025 at 07:10:17AM -0700, Sean Christopherson wrote:
> On Wed, Apr 23, 2025, Borislav Petkov wrote:
> > > Eww.  Optimization to lessen the pain of DR7 interception.  It'd be nice to clean
> > > this up at some point, especially with things like SEV-ES with DebugSwap, where
> > > DR7 is never intercepted.
> > >   arch/x86/include/asm/debugreg.h:        if (static_cpu_has(X86_FEATURE_HYPERVISOR) && !hw_breakpoint_active())
> > >   arch/x86/kernel/hw_breakpoint.c:                 * When in guest (X86_FEATURE_HYPERVISOR), local_db_save()
> > 
> > Patch adding it says "Because DRn access is 'difficult' with virt;..."
> > so yeah. I guess we need to agree how to do debug exceptions in guests.
> > Probably start documenting it and then have guest and host adhere to
> > that. I'm talking completely without having looked at what the code does
> > but the "handshake" agreement should be something like this and then we
> > can start simplifying code...
> 
> I don't know that we'll be able to simplify the code.
> 
> #DBs in the guest are complex because DR[0-3] aren't context switched by hardware,
> and running with active breakpoints is uncommon.  As a result, loading the guest's
> DRs into hardware on every VM-Enter is undesirable, because it would add significant
> latency (load DRs on entry, save DRs on exit) for a relatively rare situation
> (guest has active breakpoints).
> 
> KVM (and presumably other hypervisors) intercepts DR accesses so that it can
> detect when the guest has active breakpoints (DR7 bits enabled), at which point
> KVM does load the guest's DRs into hardware and disables DR interception until
> the next VM-Exit.
> 
> KVM also allows the host user to utilize hardware breakpoints to debug the guest,
> which further adds to the madness, and that's not something the guest can change
> or even influence.
> 
> So removing the "am I guest logic" entirely probably isn't feasible, because in
> the common case where there are no active breakpoints, reading cpu_dr7 instead
> of DR7 is a significant performance boost for "normal" VMs.

So I see three modes:

- default off - the usual case

- host debugs the guest

- guests are allowed to do breakpoints

So depending on what is enabled, the code can behave properly - it just
needs logic which tells the relevant code - guest or host - which of the
debugging mode is enabled. And then everything adheres to that and DTRT.

But before any of that, the even more important question is: do we even
care to beef it up that much?

I get the feeling that we don't so it likely is a "whatever's the
easiest" game.

> I mentioned SEV-ES+ DebugSwap because in that case DR7 is effectively guaranteed
> to not be intercepted, and so the native behavior of reading DR7 instead of the
> per-CPU variable is likely desirable.  I believe TDX has similar functionality
> (I forget if it's always on, or opt-in).

Aha, the choice was made by the CoCo hw designers - guests are allowed
to do breakpoints.

Oh well...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette