linux-kernel - Re: [PATCH v2] x86: Skip WBINVD instruction for VM guest

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87lf126010.ffs@tglx>
Date:   Fri, 03 Dec 2021 00:48:43 +0100
From:   Thomas Gleixner <tglx@...utronix.de>
To:     "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
Cc:     Kuppuswamy Sathyanarayanan 
        <sathyanarayanan.kuppuswamy@...ux.intel.com>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
        "Rafael J . Wysocki" <rjw@...ysocki.net>,
        "H . Peter Anvin" <hpa@...or.com>, Tony Luck <tony.luck@...el.com>,
        Dan Williams <dan.j.williams@...el.com>,
        Andi Kleen <ak@...ux.intel.com>,
        Kuppuswamy Sathyanarayanan <knsathya@...nel.org>,
        linux-kernel@...r.kernel.org, linux-acpi@...r.kernel.org
Subject: Re: [PATCH v2] x86: Skip WBINVD instruction for VM guest

Kirill,

On Fri, Dec 03 2021 at 01:21, Kirill A. Shutemov wrote:
> On Thu, Nov 25, 2021 at 01:40:24AM +0100, Thomas Gleixner wrote:
>> Kuppuswamy,
>> Either that or you provide patches with arguments which are based on
>> proper analysis and not on 'appears to' observations.
>
> I think the right solution to the WBINVD would be to add a #VE handler
> that does nothing. We don't have a reasonable way to handle it from within
> the guest. We can call the VMM in hope that it would handle it, but VMM is
> untrusted and it can ignore the request.
>
> Dave suggested that we need to do code audit to make sure that there's no
> user inside TDX guest environment that relies on WBINVD to work correctly.
>
> Below is full call tree of WBINVD. It is substantially larger than I
> anticipated from initial grep.
>
> Conclusions:
>
>   - Most of callers are in ACPI code on changing S-states. Ignoring cache
>     flush for S-state change on virtual machine should be safe.
>
>   - The only WBINVD I was able to trigger is on poweroff from ACPI code.
>     Reboot also should trigger it, but for some reason I don't see it.
>
>   - Few caller in CPU offline code. TDX does not allowed to offline CPU as
>     we cannot bring it back -- we don't have SIPI. And even if offline
>     works for vCPU it should be safe to ignore WBINVD there.
>
>   - NVDIMMs are not supported inside TDX. If it will change we would need
>     to deal with cache flushing for this case. Hopefully, we would be able
>     to avoid WBINVD.
>
>   - Cache QoS and MTRR use WBINVD. They are disabled in TDX, but it is
>     controlled by VMM if the feature is advertised. We would need to
>     filter CPUID/MSRs to make sure VMM would not mess with them.
>
> Is it good enough justification for do-nothing #VE WBINVD handler?

first of all thank you very much for this very profound analysis.

This is really what I was asking for and you probably went even a step
deeper than that. Very appreciated.

What we should do instead of doing a wholesale let's ignore WBINVD is to
have a separate function/macro:

 ACPI_FLUSH_CPU_CACHE_PHYS()

and invoke that from the functions which are considered to be safe.

That would default to ACPI_FLUSH_CPU_CACHE() for other architecures
obviously.

Then you can rightfully do:

#define ACPI_FLUSH_CPU_CACHE_PHYS()     \
        if (!cpu_feature_enabled(XXX))	\
        	wbinvd();               \              
                
where $XXX might be FEATURE_TDX_GUEST for paranoia sake and then
extended to X86_FEATURE_HYPERVISOR if everyone agrees.

Then you have the #VE handler which just acts on any other wbinvd
invocation via warn, panic, whatever, no?

Thanks,

        tglx