lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140417192330.GE4321@pd.tnic>
Date:	Thu, 17 Apr 2014 21:23:30 +0200
From:	Borislav Petkov <bp@...en8.de>
To:	Josh Boyer <jwboyer@...oraproject.org>,
	Owen Kibel <qmewlo@...il.com>
Cc:	"Luck, Tony" <tony.luck@...el.com>,
	Peter Zijlstra <peterz@...radead.org>,
	"mingo@...nel.org" <mingo@...nel.org>,
	"hpa@...or.com" <hpa@...or.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"torvalds@...ux-foundation.org" <torvalds@...ux-foundation.org>,
	"atodorov@...hat.com" <atodorov@...hat.com>,
	"gong.chen@...ux.intel.com" <gong.chen@...ux.intel.com>,
	"qmewlo@...il.com" <qmewlo@...il.com>,
	"tglx@...utronix.de" <tglx@...utronix.de>,
	"linux-tip-commits@...r.kernel.org" 
	<linux-tip-commits@...r.kernel.org>
Subject: Re: [tip:x86/urgent] x86/mce: Fix CMCI preemption bugs

On Thu, Apr 17, 2014 at 12:54:21PM -0400, Josh Boyer wrote:
> A user (Owen) reported seeing the following backtrace with 3.15-rc1+:
> 
> kernel: [  120.253539] Hardware name: Hewlett-Packard HP ENVY 15
> Notebook PC/1962, BIOS F.24 08/27/2013
> kernel: [  120.253540]  ffff88025f2146c0 ffffffff81c01e40
> ffffffff8171dc1d ffffffff81d11aa0
> kernel: [  120.253543]  ffffffff81c01e50 ffffffff81719a39
> ffffffff81c01eb0 ffffffff8172151b
> kernel: [  120.253545]  ffffffff81c18480 ffffffff81c01fd8
> 00000000000146c0 00000000000146c0
> kernel: [  120.253548] Call Trace:
> kernel: [  120.253555]  [<ffffffff8171dc1d>] dump_stack+0x4d/0x6f
> kernel: [  120.253558]  [<ffffffff81719a39>] __schedule_bug+0x4c/0x5a
> kernel: [  120.253560]  [<ffffffff8172151b>] __schedule+0x6eb/0x7a0
> kernel: [  120.253563]  [<ffffffff81721b91>] schedule_preempt_disabled+0x31/0x80
> kernel: [  120.253566]  [<ffffffff810af8f3>] cpu_startup_entry+0x173/0x490
> kernel: [  120.253570]  [<ffffffff8170e3e4>] rest_init+0x84/0x90
> kernel: [  120.253574]  [<ffffffff81d34f83>] start_kernel+0x450/0x45b
> kernel: [  120.253576]  [<ffffffff81d3493c>] ? repair_env_string+0x5c/0x5c
> kernel: [  120.253578]  [<ffffffff81d34120>] ? early_idt_handlers+0x120/0x120
> kernel: [  120.253581]  [<ffffffff81d345ee>] x86_64_start_reservations+0x2a/0x2c
> kernel: [  120.253583]  [<ffffffff81d3472e>] x86_64_start_kernel+0x13e/0x14d
> 
> For whatever reason, his report didn't hit lkml, but it did hit Linus'
> inbox.  Linus took a look around, googled some, and came across a
> report that Alexander filed 2 days ago against Fedora rawhide with a
> similar backtrace:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1087810
> 
> Linus CC'd me and Alexander on his reply to Owen, Ingo, Peter, etc.
> Ingo was aware of Chen Gong's patch, but when Owen tested it it
> produced the BUG line above.  So Ingo came up with a slightly
> different fix to hopefully resolve that as well.  We haven't heard
> from Owen whether Ingo's patch resolves everything yet.
> 
> I think (hope) that is the full backstory.  Ingo or Peter could correct me.

Thanks Josh, that explains it. Owen bounced me his mail too, in the
meantime.

So, the only thing I'm seeing so far is that I cannot reproduce this
with Owen's config on my machines with Linus' tree from right now.

And the only sane idea I have currently is that without the get_cpu_var
fix, preemption count gets fumbled just right down the

identify_boot_cpu ->
 mcheck_cpu_init ->
  machine_check_poll ->
   get_cpu_var ->
    preempt_disable

path so that the initial schedule we do, catches it:

start_kernel -> rest_init -> schedule_preempt_disabled

Which would mean that the only fix for that should be Gong's
initial fix which Tony just sent (see attached), i.e. without the
s/raw_spin_lock/spin_lock/ changes from Ingo.

But you're saying here, Owen tested it already. Which kinda hints at with that

kernel: [    7.341085] BUG: using __this_cpu_write() in preemptible [00000000] code: modprobe/546

line which was in one of the previous mails.

Anyway, I did merge Owen's config wrt PREEMPT into mine:

$ grep PREEMPT .config
CONFIG_TREE_PREEMPT_RCU=y
CONFIG_PREEMPT_RCU=y
CONFIG_PREEMPT_NOTIFIERS=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_COUNT=y
# CONFIG_DEBUG_PREEMPT is not set
# CONFIG_PREEMPT_TRACER is not set

and I can't repro it. With, or without the attached patch :-\.
Which basically could mean that there's either some specific timing
Owen's or Alex' boxes are hitting which mine aren't or there's some
miscommunication in the whole thing.

@Owen: can you please test the attached patch if you haven't done so
already and report back?

Thanks.

/me is still confused.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

View attachment "diff" of type "text/plain" (1266 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ