linux-kernel - Re: [PATCH RFC] x86/cpu: fix intermittent lockup on poweroff

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <469754ab-d8ec-168a-15c7-61045a880792@amd.com>
Date:   Wed, 26 Apr 2023 14:18:34 -0500
From:   Tom Lendacky <thomas.lendacky@....com>
To:     Dave Hansen <dave.hansen@...el.com>,
        Tony Battersby <tonyb@...ernetics.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org
Cc:     "H. Peter Anvin" <hpa@...or.com>,
        Mario Limonciello <mario.limonciello@....com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Andi Kleen <ak@...ux.intel.com>
Subject: Re: [PATCH RFC] x86/cpu: fix intermittent lockup on poweroff



On 4/26/23 13:15, Dave Hansen wrote:
> On 4/26/23 10:51, Tom Lendacky wrote:
>>>> +    /*
>>>> +     * native_stop_other_cpus() will write to @stop_cpus_count after
>>>> +     * observing that it went down to zero, which will invalidate the
>>>> +     * cacheline on this CPU.
>>>> +     */
>>>> +    atomic_dec(&stop_cpus_count);
>>
>> This is probably going to pull in a cache line and cause the problem the
>> native_wbinvd() is trying to avoid.
> 
> Is one _more_ cacheline really the problem?

The answer is it depends. If the cacheline ends up modified/dirty, then it 
can be a problem.

> 
> Or is having _any_ cacheline pulled in a problem?  What about the text
> page containing the WBINVD?  How about all the page table pages that are
> needed to resolve %RIP to a physical address?

It's been a while since I looked into all this, but text and page table 
pages didn't present any problems because they weren't modified, but stack 
memory was. Doing a plain wbinvd() resulted in calls to the paravirt 
support and stack data from the call to wbinvd() ended up in some page 
structs in the kexec kernel (applicable to zen1 and zen2). Using 
native_wbinvd() eliminated the stack data changes after the WBINVD and 
didn't end up with any corruption following a kexec.

> 
> What about the mds_idle_clear_cpu_buffers() code that snuck into
> native_halt()?

Luckily that is all inline and using a static branch which isn't enabled 
for AMD and should just jmp to the hlt, so no modified cache lines.

Thanks,
Tom

> 
>> ffffffff810ede4c:       0f 09                   wbinvd
>> ffffffff810ede4e:       8b 05 e4 3b a7 02       mov    0x2a73be4(%rip),%eax        # ffffffff83b61a38 <mds_idle_clear>
>> ffffffff810ede54:       85 c0                   test   %eax,%eax
>> ffffffff810ede56:       7e 07                   jle    ffffffff810ede5f <stop_this_cpu+0x9f>
>> ffffffff810ede58:       0f 00 2d b1 75 13 01    verw   0x11375b1(%rip)        # ffffffff82225410 <ds.6688>
>> ffffffff810ede5f:       f4                      hlt
>> ffffffff810ede60:       eb ec                   jmp    ffffffff810ede4e <stop_this_cpu+0x8e>
>> ffffffff810ede62:       e8 59 40 1a 00          callq  ffffffff81291ec0 <trace_hardirqs_off>
>> ffffffff810ede67:       eb 85                   jmp    ffffffff810eddee <stop_this_cpu+0x2e>
>> ffffffff810ede69:       0f 1f 80 00 00 00 00    nopl   0x0(%rax)
>