linux-kernel - Re: [PATCH 1/3] panic: Disable crash_kexec_post

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87si8qmxef.fsf@x220.int.ebiederm.org>
Date:	Tue, 14 Jul 2015 13:01:12 -0500
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	Vivek Goyal <vgoyal@...hat.com>
Cc:	dwalker@...o99.com, Hidehiro Kawai <hidehiro.kawai.ez@...achi.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-mips@...ux-mips.org, Baoquan He <bhe@...hat.com>,
	linux-sh@...r.kernel.org, linux-s390@...r.kernel.org,
	kexec@...ts.infradead.org, linux-kernel@...r.kernel.org,
	Ingo Molnar <mingo@...nel.org>,
	HATAYAMA Daisuke <d.hatayama@...fujitsu.com>,
	Masami Hiramatsu <masami.hiramatsu.pt@...achi.com>,
	linuxppc-dev@...ts.ozlabs.org, linux-metag@...r.kernel.org,
	linux-arm-kernel@...ts.infradead.org
Subject: Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available

Vivek Goyal <vgoyal@...hat.com> writes:

> On Tue, Jul 14, 2015 at 05:29:53PM +0000, dwalker@...o99.com wrote:
>
> [..]
>> > >> > If a machine is failing, there are high chance it can't deliver you the
>> > >> > notification. Detecting that failure suing some kind of polling mechanism
>> > >> > might be more reliable. And it will make even kdump mechanism more
>> > >> > reliable so that it does not have to run panic notifiers after the crash.
>> > >> 
>> > >> I think what your suggesting is that my company should change how it's hardware works
>> > >> and that's not really an option for me. This isn't a simple thing like checking over the
>> > >> network if the machine is down or not, this is way more complex hardware design.
>> > >
>> > > That means you are ready to live with an unreliable design. There might be
>> > > cases where notifier does not get run properly and you will not do switch
>> > > despite the fact that OS has failed. I was just trying to nudge you in
>> > > a direction which could be more reliable mechanism.
>> > 
>> > Sigh I see some deep confusion going on here.
>> > 
>> > The panic notifiers are just that panic notifiers.  They have not been
>> > nor should they be tied to kexec.   If those notifiers force a switch
>> > over of between machines I fail to see why you would care if it was
>> > kexec or another panic situation that is forcing that switchover.
>> 
>> Hidehiro isn't fixing the failover situation on my side, he's fixing register
>> information collection when crash_kexec_post_notifiers is used.
>
> Sure. Given that we have created this new parameter, let us fix it so that
> we can capture the other cpu register state in crash dump.
>
> I am little disappointed that it was not tested well when this parameter was
> introuced. We should have atleast tested it to the extent to see if there
> is proper cpu state present for all cpus in the crash dump.
>
> At that point of time it looked like a simple modification
> to allow panic notifiers before crash_kexec().

Either that or we say no one cares enough, and it known broken so let's
just revert the fool thing.

I honestly can't see how to support panic notifiers, before kexec.
There is no way to tell what is being done and all of the pieces
including smp_send_stop are known to be buggy.

It isn't like this latest set of patches was reviewed/tested much
better, as the first patch was wrong.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/