lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150714161612.GH10792@redhat.com>
Date:	Tue, 14 Jul 2015 12:16:12 -0400
From:	Vivek Goyal <vgoyal@...hat.com>
To:	dwalker@...o99.com
Cc:	"Eric W. Biederman" <ebiederm@...ssion.com>,
	Hidehiro Kawai <hidehiro.kawai.ez@...achi.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-mips@...ux-mips.org, Baoquan He <bhe@...hat.com>,
	linux-sh@...r.kernel.org, linux-s390@...r.kernel.org,
	kexec@...ts.infradead.org, linux-kernel@...r.kernel.org,
	Ingo Molnar <mingo@...nel.org>,
	HATAYAMA Daisuke <d.hatayama@...fujitsu.com>,
	Masami Hiramatsu <masami.hiramatsu.pt@...achi.com>,
	linuxppc-dev@...ts.ozlabs.org, linux-metag@...r.kernel.org,
	linux-arm-kernel@...ts.infradead.org
Subject: Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump
 is not available

On Tue, Jul 14, 2015 at 03:48:33PM +0000, dwalker@...o99.com wrote:
> On Tue, Jul 14, 2015 at 11:40:40AM -0400, Vivek Goyal wrote:
> > On Tue, Jul 14, 2015 at 03:34:30PM +0000, dwalker@...o99.com wrote:
> > > On Tue, Jul 14, 2015 at 11:02:08AM -0400, Vivek Goyal wrote:
> > > > On Tue, Jul 14, 2015 at 01:59:19PM +0000, dwalker@...o99.com wrote:
> > > > > On Mon, Jul 13, 2015 at 08:19:45PM -0500, Eric W. Biederman wrote:
> > > > > > dwalker@...o99.com writes:
> > > > > > 
> > > > > > > On Fri, Jul 10, 2015 at 08:41:28AM -0500, Eric W. Biederman wrote:
> > > > > > >> Hidehiro Kawai <hidehiro.kawai.ez@...achi.com> writes:
> > > > > > >> 
> > > > > > >> > You can call panic notifiers and kmsg dumpers before kdump by
> > > > > > >> > specifying "crash_kexec_post_notifiers" as a boot parameter.
> > > > > > >> > However, it doesn't make sense if kdump is not available.  In that
> > > > > > >> > case, disable "crash_kexec_post_notifiers" boot parameter so that
> > > > > > >> > you can't change the value of the parameter.
> > > > > > >> 
> > > > > > >> Nacked-by: "Eric W. Biederman" <ebiederm@...ssion.com>
> > > > > > >
> > > > > > > I think it would make sense if he just replaced "kdump" with "kexec".
> > > > > > 
> > > > > > It would be less insane, however it still makes no sense as without
> > > > > > kexec on panic support crash_kexec is a noop.  So the value of the
> > > > > > seeting makes no difference.
> > > > > 
> > > > > Can you explain more, I don't really understand what you mean. Are you suggesting
> > > > > the whole "crash_kexec_post_notifiers" feature has no value ?
> > > > 
> > > > Daniel,
> > > > 
> > > > BTW, why are you using crash_kexec_post_notifiers commandline? Why not
> > > > without it?
> > > 
> > > It was explained in the prior thread but to rehash, the notifiers are used to do a switch
> > > over from the crashed machine to another redundant machine.
> > 
> > So why not detect failure using polling or issue notifications from second
> > kernel.
> > 
> > IOW, expecting that a crashed machine will be able to deliver notification
> > reliably is falwed to begin with, IMHO.
> 
> It's flawed to think you can kexec, but you still do it right ? I've not gotten into
> the deep details of this switching process, but that's how this interface is used.

Sure. But the deal here is that users of interface know that sometimes it
can be unreliable. And in the absence of more reliable mechanism, somewhat
less reliable mechanism is fine. 

>  
> > If a machine is failing, there are high chance it can't deliver you the
> > notification. Detecting that failure suing some kind of polling mechanism
> > might be more reliable. And it will make even kdump mechanism more
> > reliable so that it does not have to run panic notifiers after the crash.
> 
> I think what your suggesting is that my company should change how it's hardware works
> and that's not really an option for me. This isn't a simple thing like checking over the
> network if the machine is down or not, this is way more complex hardware design.

That means you are ready to live with an unreliable design. There might be
cases where notifier does not get run properly and you will not do switch
despite the fact that OS has failed. I was just trying to nudge you in
a direction which could be more reliable mechanism.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ