lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <71399a4c1584139763587534957a435bafa47051.camel@linux.intel.com>
Date:   Thu, 02 Jun 2022 14:13:10 -0700
From:   srinivas pandruvada <srinivas.pandruvada@...ux.intel.com>
To:     Arnd Bergmann <arnd@...nel.org>
Cc:     Len Brown <len.brown@...el.com>,
        Ricardo Neri <ricardo.neri-calderon@...ux.intel.com>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Daniel Lezcano <daniel.lezcano@...aro.org>,
        Amit Kucheria <amitk@...nel.org>,
        Zhang Rui <rui.zhang@...el.com>, linux-pm@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: x86/mce/therm_throt incorrect THERM_STATUS_CLEAR_CORE_MASK?

On Thu, 2022-06-02 at 22:42 +0200, Arnd Bergmann wrote:
> On Thu, Jun 2, 2022 at 10:10 PM srinivas pandruvada
> <srinivas.pandruvada@...ux.intel.com> wrote:
> > On Thu, 2022-06-02 at 20:53 +0200, Arnd Bergmann wrote:
> > > 
> > > I wonder how common this problem it is. Would it help to add a
> > > driver
> > > workaround
> > > like this?
> > This issue affects only certain skews. The others already working
> > as
> > expected. These are important log bits for debug, we don't want to
> > clear in this path. Printing warning for CLX stepping is fine
> > without
> > clearing unrelated bits 13 and 15.
> > Read-modify-update should always work where we only update the bits
> > of
> > interest. Writing 1s to this register should be NOP.
> 
> The patch I suggested doesn't change the behavior unless the initial
> write causes an exception. As long as only buggy microcode rejects
> the
> write, the second write just serves to clear the state that causes
> the
> repeated stack dumps.
But it will clear BIT 13 and 15 in this case. So atleast print the
current msr value in the warning message so that we don't loose the BIT
13 and BIT 15 values, in case we need them for debug.

Thanks,
Srinivas

> 
>        Arnd
> 
> > > @@ -214,7 +214,13 @@ static void clear_therm_status_log(int
> > > level)
> > > 
> > >         rdmsrl(msr, msr_val);
> > >         msr_val &= mask;
> > > -       wrmsrl(msr, msr_val & ~THERM_STATUS_PROCHOT_LOG);
> > > +       if (wrmsrl_safe(msr, msr_val &
> > > ~THERM_STATUS_PROCHOT_LOG)) {
> > > +               /* work around Cascade Lake SKZ57 erratum */
> > > +               printk_once(KERN_WARNING "Failed to update
> > > IA32_THERM_STATUS, "
> > > +                                       "please upgrade
> > > microcode\n");
> > > +               wrmsrl(msr, msr_val & ~THERM_STATUS_PROCHOT_LOG &
> > > +                       ~BIT(13) & ~BIT(15));
> > > +       }
> > >  }
> > > 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ