lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 27 May 2021 13:28:59 -0700
From:   Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>
To:     Borislav Petkov <bp@...en8.de>
Cc:     Borislav Petkov <bp@...e.de>, James Feeney <james@...ealm.net>,
        linux-smp@...r.kernel.org, Jens Axboe <axboe@...nel.dk>,
        lkml <linux-kernel@...r.kernel.org>,
        Zhang Rui <rui.zhang@...el.com>, x86-ml <x86@...nel.org>
Subject: Re: [PATCH] x86/thermal: Fix LVT thermal setup for SMI delivery mode

On Thu, 2021-05-27 at 21:01 +0200, Borislav Petkov wrote:
> On Thu, May 27, 2021 at 11:09:59AM -0700, Srinivas Pandruvada wrote:
> > My guess is that system is booting hot sometimes. SMM started fan
> > or
> > some cooling and set a temperature threshold. It is waiting for
> > thermal
> > interrupt for temperature threshold, which it never got.
> 
> Are you saying that that replication of lvtthmr_init to the APs in
> intel_init_thermal() is absolutely needed on those SMI machines
> running
> hot?

We have seen some SMM uses thermal interrupts. We had one issue in one
Yoga systems several years back where SMM handling of thermal interrupt
related to HWP caused hard hang as it crashed there.
So yes, there may be special thing for cooling also.

> 
> That thing:
> 
>          * If BIOS takes over the thermal interrupt and sets its
> interrupt
>          * delivery mode to SMI (not fixed), it restores the value
> that the
>          * BIOS has programmed on AP based on BSP's info we saved
> since BIOS
>          * is always setting the same value for all threads/cores.
> 
> ?
> 
> Me moving that lvtthmr_init read later would replicate the wrong
> value
> because we'd soft-disable the APIC and thus the core would lockup
> waiting...
I think so.
I will try to force replicate wrong value in Yoga system which used to
crash in thermal interrupt handling of SMM code and check what happens.
 This shouldn't crash as it will not get thermal interrupt. Since the
system is not with me, I can try next week.

> 
> The other interesting thing is that the core would always lockup when
> trying to IPI another core to remote-flush the TLBs.
> 
Here I think the other core didn't exit SMM mode.

Thanks,
Srinivas


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ