lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150410040726.GB3623@hori1.linux.bs1.fc.nec.co.jp>
Date:	Fri, 10 Apr 2015 04:07:26 +0000
From:	Naoya Horiguchi <n-horiguchi@...jp.nec.com>
To:	Borislav Petkov <bp@...en8.de>
CC:	"Luck, Tony" <tony.luck@...el.com>, Ingo Molnar <mingo@...nel.org>,
	"Prarit Bhargava" <prarit@...hat.com>,
	Vivek Goyal <vgoyal@...hat.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Junichi Nomura <j-nomura@...jp.nec.com>,
	Kiyoshi Ueda <k-ueda@...jp.nec.com>
Subject: Re: [PATCH v8] x86: mce: kexec: switch MCE handler for kexec/kdump

On Fri, Apr 10, 2015 at 12:49:33AM +0000, Horiguchi Naoya(堀口 直也) wrote:
> On Thu, Apr 09, 2015 at 09:05:51PM +0200, Borislav Petkov wrote:
> > On Thu, Apr 09, 2015 at 06:22:02PM +0000, Luck, Tony wrote:
> > > > Why? Those CPUs are offlined and num_online_cpus() in mce_start() should
> > > > account for that, no?
> > > >
> > > > And if those are offlined, they're very very unlikely to trigger an MCE
> > > > as they're idle and not executing code.
> > > 
> > > Let's step back a few feet and look at the big picture.  There are three main classes of machine check
> > > that we might see while trying to run kdump - an remember that all machine checks are currently
> > > broadcast, so all cpus whether online or offline will see them
> > > 
> > > 1) Fatal
> > > We have to crash - lose the dump.  Having a new machine check handler will make things a bit easier
> > > to see what happened because we won't have any synchronization failed messages from the offline
> > > cpus.
> > 
> > But this should not be a problem if kdump path keeps cpu_online_mask
> > uptodate. I'm looking at kdump_nmi_callback() or crash_nmi_callback() or
> > so. Those should clear cpu_online_mask and then mce_start() will work
> > fine on the crashing CPU.
> > 
> > IMHO, of course.
> 
> Sorry, I misread you. With clearing cpu_online_mask in shootdown (not done
> yet,) raising tolerance should work without timeout message.
> So I think you are right.

... wait, changing cpu_online_mask might confuse admins who try to
analyze the kdump, especially when the problems causing panic are CPU
related issues?

In the similar way, changing tolerant value loses the original value,
although this is unlikely to be a problem. But if we change it, using
an upper bit to keep lowest 2 bit to save the original value is better?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ