lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170601082744.GD23936@nuc-i3427.alporthouse.com>
Date:   Thu, 1 Jun 2017 09:27:44 +0100
From:   Chris Wilson <chris@...is-wilson.co.uk>
To:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Mikulas Patocka <mpatocka@...hat.com>,
        Ingo Molnar <mingo@...nel.org>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        "H. Peter Anvin" <hpa@...or.com>, linux-kernel@...r.kernel.org
Subject: [v4.12-rc3] Early boot panic on Broadwell

Hi guys,

I hit an early boot panic on a Broadwell laptop (xps13-9343) that I
bisected to:

commit cbed27cdf0e3f7ea3b2259e86b9e34df02be3fe4
Author: Mikulas Patocka <mpatocka@...hat.com>
Date:   Tue Apr 18 15:07:11 2017 -0400

    x86/PAT: Fix Xorg regression on CPUs that don't support PAT
    
    In the file arch/x86/mm/pat.c, there's a '__pat_enabled' variable. The
    variable is set to 1 by default and the function pat_init() sets
    __pat_enabled to 0 if the CPU doesn't support PAT.
    
    However, on AMD K6-3 CPUs, the processor initialization code never calls
    pat_init() and so __pat_enabled stays 1 and the function pat_enabled()
    returns true, even though the K6-3 CPU doesn't support PAT.
    
    The result of this bug is that a kernel warning is produced when attempting to
    start the Xserver and the Xserver doesn't start (fork() returns ENOMEM).
    Another symptom of this bug is that the framebuffer driver doesn't set the
    K6-3 MTRR registers:
    
      x86/PAT: Xorg:3891 map pfn expected mapping type uncached-minus for [mem 0xe4000000-0xe5ffffff], got write-combining
      ------------[ cut here ]------------
      WARNING: CPU: 0 PID: 3891 at arch/x86/mm/pat.c:1020 untrack_pfn+0x5c/0x9f
      ...
      x86/PAT: Xorg:3891 map pfn expected mapping type uncached-minus for [mem 0xe4000000-0xe5ffffff], got write-combining
    
    To fix the bug change pat_enabled() so that it returns true only if PAT
    initialization was actually done.
    
    Also, I changed boot_cpu_has(X86_FEATURE_PAT) to
    this_cpu_has(X86_FEATURE_PAT) in pat_ap_init(), so that we check the PAT
    feature on the processor that is being initialized.

In my testing, I found that reverting the /boot_cpu_has/this_cpu_has/
change was enough to restore working behaviour:

diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 83a59a6..c537bfb 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -234,7 +234,7 @@ static void pat_bsp_init(u64 pat)
 
 static void pat_ap_init(u64 pat)
 {
-       if (!this_cpu_has(X86_FEATURE_PAT)) {
+       if (!boot_cpu_has(X86_FEATURE_PAT)) {
                /*
                 * If this happens we are on a secondary CPU, but switched to
                 * PAT on the boot CPU. We have no way to undo PAT.

Seems scary enough that different cpus may have different features, but
that may just be a symptom of the boot phase?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ