[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <24844584-8031-4b58-ba5c-f85ef2f4c718@amd.com>
Date: Tue, 20 Feb 2024 16:30:13 -0600
From: Tom Lendacky <thomas.lendacky@....com>
To: "Huang, Kai" <kai.huang@...el.com>, "bp@...en8.de" <bp@...en8.de>
Cc: "Gao, Chao" <chao.gao@...el.com>, "Hansen, Dave" <dave.hansen@...el.com>,
"luto@...nel.org" <luto@...nel.org>, "x86@...nel.org" <x86@...nel.org>,
"peterz@...radead.org" <peterz@...radead.org>, "hpa@...or.com"
<hpa@...or.com>, "mingo@...hat.com" <mingo@...hat.com>,
"kirill.shutemov@...ux.intel.com" <kirill.shutemov@...ux.intel.com>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"pbonzini@...hat.com" <pbonzini@...hat.com>,
"nik.borisov@...e.com" <nik.borisov@...e.com>,
"bhe@...hat.com" <bhe@...hat.com>
Subject: Re: [PATCH 1/4] x86/coco: Add a new CC attribute to unify cache flush
during kexec
On 2/20/24 14:07, Huang, Kai wrote:
> On Tue, 2024-02-20 at 08:47 -0600, Tom Lendacky wrote:
>> On 2/20/24 08:28, Borislav Petkov wrote:
>>> On Mon, Feb 19, 2024 at 04:09:47PM -0600, Tom Lendacky wrote:
>>>> That's why the '!(sev_status & MSR_AMD64_SEV_ENABLED)' works here.
>>>
>>> I would've never figured that out just from staring at the test. :-\
>>>
>>>> Basically, if you are bare-metal, it will return true. And it will only
>>>> return true for machines that support SME and have the
>>>> MSR_AMD64_SYSCFG_MEM_ENCRYPT bit set in SYS_CFG MSR because of where the
>>>> 'cc_vendor = CC_VENDOR_AMD' assignment is. However, if you move the
>>>> 'cc_vendor = CC_VENDOR_AMD' to before the if statement, then you will have
>>>> the WBINVD called for any machine that supports SME, even if SME is not
>>>> possible because the proper bit in the SYS_CFG MSR hasn't been set.
>>>>
>>>> I know what I'm trying to say, let me know if it is making sense...
>>>
>>> Yah, thanks for taking the time to explain.
>>>
>>> Here's an even more radical idea:
>>>
>>> Why not do WBINVD *unconditionally* on the CPU down path?
>>>
>>> - it is the opposite of a fast path, i.e., no one cares
>>>
>>> - it'll take care of every possible configuration without ugly logic
>>>
>>> - it wouldn't hurt to have the caches nice and coherent before going
>>> down
>>>
>>> Hmmm.
>>
>> That's what I initially did, but errors were reported, see commit:
>> f23d74f6c66c ("x86/mm: Rework wbinvd, hlt operation in stop_this_cpu()")
>
> This changelog only mentions "Some issues". Do you know exactly what kind
> issues did you see? Are these issues only appeared on SME enabled system or
> other non-SME-capable systems too?
I believe the issues were that different Intel systems would hang or reset
and it was bisected to that commit that added the WBINVD. It was a while
ago, but I remember that they were similar to what the 1f5e7eb7868e commit
ended up fixing, which was debugged because sometimes the WBINVD was still
occasionally issued resulting in the following patch
9b040453d444 ("x86/smp: Dont access non-existing CPUID leaf")
It just means that if we go to an unconditional WBINVD, then we need to be
careful.
Thanks,
Tom
>
> Because ...
>
>>
>> But that may be because of the issue solved by commit:
>> 1f5e7eb7868e ("x86/smp: Make stop_other_cpus() more robust")
>
> ... the issue resolved in this commit seems to be hang.
>
Powered by blists - more mailing lists