[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87lf126010.ffs@tglx>
Date: Fri, 03 Dec 2021 00:48:43 +0100
From: Thomas Gleixner <tglx@...utronix.de>
To: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
Cc: Kuppuswamy Sathyanarayanan
<sathyanarayanan.kuppuswamy@...ux.intel.com>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
"Rafael J . Wysocki" <rjw@...ysocki.net>,
"H . Peter Anvin" <hpa@...or.com>, Tony Luck <tony.luck@...el.com>,
Dan Williams <dan.j.williams@...el.com>,
Andi Kleen <ak@...ux.intel.com>,
Kuppuswamy Sathyanarayanan <knsathya@...nel.org>,
linux-kernel@...r.kernel.org, linux-acpi@...r.kernel.org
Subject: Re: [PATCH v2] x86: Skip WBINVD instruction for VM guest
Kirill,
On Fri, Dec 03 2021 at 01:21, Kirill A. Shutemov wrote:
> On Thu, Nov 25, 2021 at 01:40:24AM +0100, Thomas Gleixner wrote:
>> Kuppuswamy,
>> Either that or you provide patches with arguments which are based on
>> proper analysis and not on 'appears to' observations.
>
> I think the right solution to the WBINVD would be to add a #VE handler
> that does nothing. We don't have a reasonable way to handle it from within
> the guest. We can call the VMM in hope that it would handle it, but VMM is
> untrusted and it can ignore the request.
>
> Dave suggested that we need to do code audit to make sure that there's no
> user inside TDX guest environment that relies on WBINVD to work correctly.
>
> Below is full call tree of WBINVD. It is substantially larger than I
> anticipated from initial grep.
>
> Conclusions:
>
> - Most of callers are in ACPI code on changing S-states. Ignoring cache
> flush for S-state change on virtual machine should be safe.
>
> - The only WBINVD I was able to trigger is on poweroff from ACPI code.
> Reboot also should trigger it, but for some reason I don't see it.
>
> - Few caller in CPU offline code. TDX does not allowed to offline CPU as
> we cannot bring it back -- we don't have SIPI. And even if offline
> works for vCPU it should be safe to ignore WBINVD there.
>
> - NVDIMMs are not supported inside TDX. If it will change we would need
> to deal with cache flushing for this case. Hopefully, we would be able
> to avoid WBINVD.
>
> - Cache QoS and MTRR use WBINVD. They are disabled in TDX, but it is
> controlled by VMM if the feature is advertised. We would need to
> filter CPUID/MSRs to make sure VMM would not mess with them.
>
> Is it good enough justification for do-nothing #VE WBINVD handler?
first of all thank you very much for this very profound analysis.
This is really what I was asking for and you probably went even a step
deeper than that. Very appreciated.
What we should do instead of doing a wholesale let's ignore WBINVD is to
have a separate function/macro:
ACPI_FLUSH_CPU_CACHE_PHYS()
and invoke that from the functions which are considered to be safe.
That would default to ACPI_FLUSH_CPU_CACHE() for other architecures
obviously.
Then you can rightfully do:
#define ACPI_FLUSH_CPU_CACHE_PHYS() \
if (!cpu_feature_enabled(XXX)) \
wbinvd(); \
where $XXX might be FEATURE_TDX_GUEST for paranoia sake and then
extended to X86_FEATURE_HYPERVISOR if everyone agrees.
Then you have the #VE handler which just acts on any other wbinvd
invocation via warn, panic, whatever, no?
Thanks,
tglx
Powered by blists - more mailing lists