lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1b4569ee-8c06-4480-447b-2af8f6804053@intel.com>
Date:   Fri, 29 Dec 2017 09:32:13 -0800
From:   Dave Hansen <dave.hansen@...el.com>
To:     Alexander Tsoy <alexander@...y.me>, Greg KH <greg@...ah.com>,
        Andy Lutomirski <luto@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...nel.org>
Cc:     Borislav Petkov <bp@...e.de>,
        Boris Ostrovsky <boris.ostrovsky@...cle.com>,
        Borislav Petkov <bp@...en8.de>,
        Borislav Petkov <bpetkov@...e.de>,
        Brian Gerst <brgerst@...il.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        David Laight <David.Laight@...lab.com>,
        Denys Vlasenko <dvlasenk@...hat.com>,
        Eduardo Valentin <eduval@...zon.com>,
        Greg KH <gregkh@...uxfoundation.org>,
        "H. Peter Anvin" <hpa@...or.com>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        Juergen Gross <jgross@...e.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Rik van Riel <riel@...hat.com>,
        Will Deacon <will.deacon@....com>, aliguori@...zon.com,
        daniel.gruss@...k.tugraz.at, hughd@...gle.com, keescook@...gle.com,
        Kernel Mailing List <linux-kernel@...r.kernel.org>,
        stable <stable@...r.kernel.org>
Subject: Re: 4.14.9 with CONFIG_MCORE2 fails to boot

Does anyone have the results of build that they can share?  (vmlinux,
vmlinuz/bzImage, System.map, .config).  That, plus a corresponding
serial log with an oops would be helpful.

I tried just adding MCORE2=y to my normal config but it didn't reproduce
this.

If you can't send the entire build like that, just running scripts/
faddr2line on __schedule+0x37f/0x7b0 would be very enlightening.

On 12/29/2017 06:41 AM, Alexander Tsoy wrote:
> [    0.775461] NMI backtrace for cpu 0
> [    0.775461] CPU: 0 PID: 114 Comm: modprobe Not tainted 4.1u.0-rc5+
...
> [    0.775461] Call Trace:
> [    0.775461]  <#DF>
> [    0.775461]  ? double_fault+0xc/0x30
> [    0.775461]  ? page_fault+0x36/0x60
> [    0.775461]  do_double_fault+0xb/0x130
> [    0.775461]  </#DF>
> [    0.775461] Code: 78 4c 89 7c 24 08 4c 89 74 24 10 4c 89 6c 24 18 4c
> 89 64 2t 20 48 89 6c 24 28 48 89 5c 24 30 bb 01 00 00 00 b9 01 01 00 c0
> 0f 32 <85> d2 78 05 0f 01 f8 31 db c3 0f 1f 40 00 66 2e 0f 1f 8t 00 00 

>From the various oopses, it looks like this happens when getting a
double fault while trying to go idle.  The CPU gets is probably trying
to return from the double fault, but it didn't do anything useful in the
fault handler so it just continues faulting, but the NMI watchdog can
still get an oops out of it.

It doesn't appear to be a recursing *too* far because it's not blowing
through the stack and triple faulting.

Of the several traces, they all appear to be in paths that might call
safe_halt() (including the kvm async page fault code).  It makes me
wonder if we've been taking double faults there for a long time, but the
new trampoline stack somehow ends up being more fragile and can't
recover from the double-fault.

Couple more things:

MCORE2 seems to get one oddball compiler flag (-march=core2):

>         cflags-$(CONFIG_MCORE2) += \
>                 $(call cc-option,-march=core2,$(call cc-option,-mtune=generic))

It would be interesting to see if replacing the above "$(call" with:

	$(call cc-option,-mtune=generic)

makes the problem go away the same way as changing the .config option.

The MCORE2 config option also sets CONFIG_X86_P6_NOP, which overrides
the normal X86_64 noops, if I'm reading that code correctly.  But I
think that's much less likely to be the since there

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ