lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170930152104.GC238@x4>
Date:   Sat, 30 Sep 2017 17:21:04 +0200
From:   Markus Trippelsdorf <markus@...ppelsdorf.de>
To:     Brian Gerst <brgerst@...il.com>
Cc:     Borislav Petkov <bp@...en8.de>,
        Adam Borowski <kilobyte@...band.pl>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Andy Lutomirski <luto@...nel.org>, x86-ml <x86@...nel.org>
Subject: Re: random insta-reboots on AMD Phenom II

On 2017.09.30 at 10:20 -0400, Brian Gerst wrote:
> On Sat, Sep 30, 2017 at 8:47 AM, Markus Trippelsdorf
> <markus@...ppelsdorf.de> wrote:
> > On 2017.09.30 at 13:53 +0200, Borislav Petkov wrote:
> >> On Sat, Sep 30, 2017 at 01:29:03PM +0200, Adam Borowski wrote:
> >> > On Sat, Sep 30, 2017 at 01:11:37PM +0200, Borislav Petkov wrote:
> >> > > On Sat, Sep 30, 2017 at 04:05:16AM +0200, Adam Borowski wrote:
> >> > > > Any hints how to debug this?
> >> > >
> >> > > Do
> >> > > rdmsr -a 0xc0010015
> >> > > as root and paste it here.
> >> >
> >> > 1000010
> >> > 1000010
> >> > 1000010
> >> > 1000010
> >> > 1000010
> >> > 1000010
> >> >
> >> > on both 4.13.4 and 4.14-rc2+.
> >>
> >> Boot into -rc2+ and do as root:
> >>
> >> # wrmsr -a 0xc0010015 0x1000018
> >>
> >> If the issue gets fixed then Mr. Luto better revert the new lazy TLB
> >> flushing fun'n'games for 4.14 before it is too late and that kernel
> >> releases b0rked.
> >
> > The issue does get fixed by setting TlbCacheDis to 1. I have been
> > running it for the last few weeks without any problems.
> > Performance is not affected at all. So it might by easier to just set
> > the bit for older AMD processors as a boot quirk.
> > Changing the TLB code so late might not be a good idea...
> 
> Looking at the AMD K10 revision guide
> (http://support.amd.com/TechDocs/41322_10h_Rev_Gd.pdf), errata #298
> that this fixes should only apply to revisions DR-BA and DR-B2, which
> include the original Phenom, but not Phenom II.  The Phenom II X6 is
> revision PH-E0, which does not have this errata.

It has nothing to do with errata #298. The new lazy TLB code causes
MCEs, because the page tables may now contain garbage.
See the long "Current mainline git (24e700e291d52bd2) hangs when
building e.g. perf" LKML thread.
-- 
Markus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ