lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87a5yu5iyc.ffs@tglx>
Date:   Thu, 27 Apr 2023 01:20:43 +0200
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Andi Kleen <ak@...ux.intel.com>,
        Tom Lendacky <thomas.lendacky@....com>
Cc:     Dave Hansen <dave.hansen@...el.com>,
        Tony Battersby <tonyb@...ernetics.com>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
        "H. Peter Anvin" <hpa@...or.com>,
        Mario Limonciello <mario.limonciello@....com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH RFC] x86/cpu: fix intermittent lockup on poweroff

Andi!

On Wed, Apr 26 2023 at 15:02, Andi Kleen wrote:
>> > > This is probably going to pull in a cache line and cause the problem the
>> > > native_wbinvd() is trying to avoid.
>> > 
>> > Is one _more_ cacheline really the problem?
>> 
>> The answer is it depends. If the cacheline ends up modified/dirty, then it
>> can be a problem.
>
> I haven't followed this all in detail, but if any dirty cache line a
> problem you probably would need to be sure that any possible NMI user
> (like perf or watchdogs) is disabled at this point, otherwise you could
> still get NMIs here. 
>
> I don't think perf currently has a mechanism to do that other
> than to offline the CPU.

stop_this_cpu()
  disable_local_APIC()
    apic_soft_disable()
      clear_local_APIC()
        v = apic_read(APIC_LVTPC);
        apic_write(APIC_LVTPC, v | APIC_LVT_MASKED);

So after that point the PMU can't raise NMIs anymore which includes the
default perf based NMI watchdog, no?

External NMIs are a different problem, but they kinda fall into the same
category as:

> Also there are of course machine checks and SMIs that could still happen, 
> but I guess there's nothing you could do about them.

Though external NMIs could be disabled via outb(0x70,...) if paranoid
enough. Albeit if there is an external NMI based watchdog raising the
NMI during this kexec() scenario then the outcome is probably as
undefined as in the MCE case independent of the SEV dirty cacheline
concern.

Thanks,

        tglx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ