linux-kernel - Re: [PATCH 0/2] x86/kvm: Force legacy PCI hole as WB under SNP/TDX

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f1800b4f27554df2b2c538bdbe0a38419a231a09.camel@intel.com>
Date: Wed, 5 Feb 2025 03:51:00 +0000
From: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
To: "seanjc@...gle.com" <seanjc@...gle.com>, "Xu, Min M" <min.m.xu@...el.com>
CC: "kvm@...r.kernel.org" <kvm@...r.kernel.org>, "dave.hansen@...ux.intel.com"
	<dave.hansen@...ux.intel.com>, "thomas.lendacky@....com"
	<thomas.lendacky@....com>, "dionnaglaze@...gle.com" <dionnaglaze@...gle.com>,
	"Wu, Binbin" <binbin.wu@...el.com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "kirill.shutemov@...ux.intel.com"
	<kirill.shutemov@...ux.intel.com>, "mingo@...hat.com" <mingo@...hat.com>,
	"pbonzini@...hat.com" <pbonzini@...hat.com>, "tglx@...utronix.de"
	<tglx@...utronix.de>, "hpa@...or.com" <hpa@...or.com>, "vkuznets@...hat.com"
	<vkuznets@...hat.com>, "bp@...en8.de" <bp@...en8.de>, "jgross@...e.com"
	<jgross@...e.com>, "x86@...nel.org" <x86@...nel.org>, "pgonda@...gle.com"
	<pgonda@...gle.com>
Subject: Re: [PATCH 0/2] x86/kvm: Force legacy PCI hole as WB under SNP/TDX

+Min, can you comment?

3a3b12cbda ("UefiCpuPkg/MtrrLib: MtrrLibIsMtrrSupported always return FALSE in
TD-Guest") turned out to be problematic in practice.

Full thread:
https://lore.kernel.org/kvm/20250201005048.657470-1-seanjc@google.com/

On Mon, 2025-02-03 at 16:27 -0800, Sean Christopherson wrote:
> On Mon, Feb 03, 2025, Rick P Edgecombe wrote:
> > On Mon, 2025-02-03 at 12:33 -0800, Sean Christopherson wrote:
> > > > Since there is no upstream KVM TDX support yet, why isn't it an option to
> > > > still
> > > > revert the EDKII commit too? It was a relatively recent change.
> > > 
> > > I'm fine with that route too, but it too is a band-aid.  Relying on the
> > > *untrusted*
> > > hypervisor to essentially communicate memory maps is not a winning strategy. 
> > > 
> > > > To me it seems that the normal KVM MTRR support is not ideal, because it is
> > > > still lying about what it is doing. For example, in the past there was an
> > > > attempt to use UC to prevent speculative execution accesses to sensitive
> > > > data.
> > > > The KVM MTRR support only happens to work with existing guests, but not all
> > > > possible MTRR usages.
> > > > 
> > > > Since diverging from the architecture creates loose ends like that, we could
> > > > instead define some other way for EDKII to communicate the ranges to the
> > > > kernel.
> > > > Like some simple KVM PV MSRs that are for communication only, and not
> > > 
> > > Hard "no" to any PV solution.  This isn't KVM specific, and as above, bouncing
> > > through the hypervisor to communicate information within the guest is asinine,
> > > especially for CoCo VMs.
> > 
> > Hmm, right.
> > 
> > So the other options could be:
> > 
> > 1. Some TDX module feature to hold the ranges:
> >  - Con: Not shared with AMD
> > 
> > 2. Re-use MTRRs for the communication, revert changes in guest and edk2:
> 
> Thinking more about how EDK2 is consumed downstream, I think reverting the EDK2
> changes is necessary regardless of what happens in the kernel.  Or at the least,
> somehow communicate to EDK2 users that ingesting those changes is a bad idea
> unless the kernel has also been updated.
> 
> AFAIK, Bring Your Own Firmware[*] isn't widely adopted, which means that the CSP
> is shipping the firmware.  And shipping OVMF/EDK2 with the "ignores MTRRs" code
> will cause problems for guests without commit 8e690b817e38 ("x86/kvm: Override
> default caching mode for SEV-SNP and TDX").  Since the host doesn't control the
> guest kernel, there's no way to know if deploying those EDK2 changes is safe.
>  
> [*] https://kvm-forum.qemu.org/2024/BYOF_-_KVM_Forum_2024_iWTioIP.pdf
> 

Hmm. Since there is no upstream TDX KVM support, for it's part, I guess KVM
should still get a chance to define a cleaner solution (if there actually was a
cleaner solution). But yea, it would mean only components from after the
solution was settled could be used together for a fully working stack. And
it should probably be called out somehow. Maybe could be in the KVM TDX docs or
something.

Still seems like a thing to avoid if possible.

> >  - Con: Creating more half support, when it's technically not required
> >  - Con: Still bouncing through the hypervisor
> 
> I assume by "Re-use MTRRs for the communication" you also mean updating the guest
> to address the "everything is UC!" flaw, otherwise another con is:
> 
>    - Con: Doesn't address the performance issue with TDX guests "using" UC
>           memory by default (unless there's yet more enabled).

Hmm. This is quite the tangled corner.

> 
> Presumably that can be accomplished by simply skipping the CR0.CD toggling, and
> doing MTRR stuff as nonrmal?

I'll have to get back to you on this one. Kirill probably could give a better
answer, but likely will not be able to follow up on this thread until next week.

> 
> >  - Pro: Design and code is clear
> > 
> > 3. Create some new architectural definition, like a bit that means "MTRRs don't
> > actually work:
> >  - Con: Takes a long time, need to get agreement
> >  - Con: Still bouncing through the hypervisor
> 
> Not for KVM guests.  As I laid out in my bug report, it's safe to assume MTRRs
> don't actually affect the memory type when running under KVM.
> 
> FWIW, PAT doesn't "work" on most KVM Intel setups either, because of misguided
> KVM code that resulted in "Ignore Guest PAT" being set in all EPTEs for the
> overwhelming majority of guests.  That's not desirable long term because it
> prevents the guest from using WC (via PAT) in situations where doing so is needed
> for performance and/or correctness.
> 
> >  - Pro: More pure solution
> 
> MTRRs "not working" is a red herring.  The problem isn't that MTRRs don't work,
> it's that the kernel is (somewhat unknowingly) using MTRRs as a crutch to get the
> desired memtype for devices.  E.g. for emulated MMIO, MTRRs _can't_ be virtualized,
> because there's never a valid mapping, i.e. there is no physical memory and thus
> no memtype.  In other words, under KVM guests (and possibly other hypervisors),
> MTRRs end up being nothing more than a communication channel between guest firmware
> and the kernel.

Yea.

> 
> The gap for CoCo VMs is that using MTRRs is undesirable because they are controlled
> by the untrusted host.  But that's largely a future problem, unless someone has a
> clever way to fix the kernel mess.
> 
> 

Yea, I wondered about that too. I imagine the thinking was that since it is only
controlling shared memory, it can be untrusted.

And I guess the solution in this patchset is hypothetically a bit more locked
down in that respect.