linux-kernel - Re: Query about kdump_msg hook into crash

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110208164656.GA29081@redhat.com>
Date:	Tue, 8 Feb 2011 11:46:56 -0500
From:	Vivek Goyal <vgoyal@...hat.com>
To:	Seiji Aguchi <seiji.aguchi@....com>
Cc:	"Eric W. Biederman" <ebiederm@...ssion.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	linux kernel mailing list <linux-kernel@...r.kernel.org>,
	Jarod Wilson <jwilson@...hat.com>
Subject: Re: Query about kdump_msg hook into crash_kexec()

On Thu, Feb 03, 2011 at 05:08:01PM -0500, Seiji Aguchi wrote:
> Hi Eric,
> 
> Thank you for your prompt reply.
> 
> I would like to consider "Needs in enterprise area" and "Implementation of kmsg_dump()" separately.
> 
> (1) Needs in enterprise area
>   In case of kdump failure, we would like to store kernel buffer to NVRAM/flush memory
>   for detecting root cause of kernel crash.
> 
> (2) Implementation of kmsg_dump 
>   You suggest to review/test cording of kmsg_dump() more.
> 
> What do you think about (1)?
> Is it acceptable for you?

Ok, I am just trying to think loud about this problem and see if something
fruitful comes out which paves the way forward.

- So ideally we would like kdump_msg() to be called after crash_kexec() so
  that any unaudited (third party modules), unreliable calls do not 
  compromise the realiability of kdump operation.

  But hitachi folks seems to be wanting to save atleast kernel buffers
  somwhere in the NVRAM etc because they think that kdump can be
  unreliable and we might not capture any information after the crash. So
  they kind of want two mechanisms in place. One is light weight which
  tries to save kernel buffers in NVRAM and then one heavy weight one
  which tries to save the entire/filtered kernel core.

  Personally I am not too excited about the idea but I guess I can live
  with it. We can try to audit atleast in kernel module and for external
  modules we don't have much control and live with the fact that if
  modules screw up, we don't capture the dump.

 Those who don't want this behavior can do three things.

	- Disable kdump_msg() at compile time.
	- Do not load any module which registers for kdump_msg()
	- Implement a /proc tunable which allows controlling this
	  behavior.

- Ok, having said why do we want it, comes the question of how to  
  do it so that it works reasonably well.

  - There seems to be on common requirement of kmsg_dump() and kdump()
    and that is stop other cpus reliably (use nmi if possible). Can
    we try to share this code between kmsg_dump and crash_kexec(). So
    something like as follows.

	- panic happens
	- Do all the activities related to printing panic string and
	  stack dump.
	- Stop other cpus.
		- This can be probably be done with the equivalent of
		  machine_crash_shutdown() function. In fact this function
		  can probably be broken down in two parts. First part
	  	  does shutdown_prepare() where all other cpus are shot
		  down and second part can do the actual disabling of
		  LAPIC/IOAPIC and saving cpu registers etc.

		if (mutex_trylock(some_shutdown_mutex)) {
			/* setp regs, fix vmcoreinfo etc */
			crash_kexec_prepare();
			machine_shutdown_prepare();
			kdump_msg();	
			crash_kexec_execute()
			/* Also call panic_notifier_list here ? */
		}

crash_kexec_prepare () {
		crash_setup_regs(&fixed_regs, regs);
		crash_save_vmcoreinfo();
}

crash_kexec_execute() {
			/* Shutdown lapic/ioapic, save this cpu register etc */
			machine_shutdown();
			machine_kexec()
}

So basically we break down machine_shutdown() function in two parts
and start sharing common part between kdump_msg(), crash_kexec and
possibly panic_notifiers. 

If kdump is not configured, then after executing kdump_msg() and panic
notifiers, we should either be sitting in tight loop with interrupt
enabled for somebody to press Ctrl-boot or reboot system upon lapse
of panic_timeout().

Eric, does it make sense to you?

Thanks
Vivek

    

> 
> Seiji
> 
> >-----Original Message-----
> >From: Eric W. Biederman [mailto:ebiederm@...ssion.com]
> >Sent: Thursday, February 03, 2011 4:13 PM
> >To: Seiji Aguchi
> >Cc: Vivek Goyal; KOSAKI Motohiro; linux kernel mailing list; Jarod Wilson
> >Subject: Re: Query about kdump_msg hook into crash_kexec()
> >
> >Seiji Aguchi <seiji.aguchi@....com> writes:
> >
> >> Hi,
> >>
> >>>PS: FWIW, Hitach folks have usage idea for their enterprise purpose, but
> >>>    unfortunately I don't know its detail. I hope anyone tell us it.
> >>
> >> I explain the usage of kmsg_dump(KMSG_DUMP_KEXEC) in enterprise area.
> >>
> >> [Background]
> >> In our support service experience, we always need to detect root cause
> >> of OS panic.
> >> So, customers in enterprise area never forgive us if kdump fails and
> >> we can't detect the root cause of panic due to lack of materials for
> >> investigation.
> >>
> >>>- Why do you need a notification from inside crash_kexec(). IOW, what
> >>>  is the usage of KMSG_DUMP_KEXEC.
> >>
> >>
> >> The usage of kdump(KMSG_DUMP_KEXEC) in enterprise area is getting
> >> useful information for investigating kernel crash in case kdump
> >> kernel doesn't boot.
> >>
> >> Kdump kernel may not start booting because there is a sha256 checksum
> >> verified over the kdump kernel before it starts booting.
> >> This means kdump kernel may fail even if there is no bug in kdump and
> >> we can't get any information for detecting root cause of kernel crash
> >
> >Sure it is theoretically possible that the sha256 checksum gets
> >corrupted (I have never seen it happen or heard reports of it
> >happening).  It is a feature that if someone has corrupted your code the
> >code doesn't try and run anyway and corrupt anything else.
> >
> >That you are arguing against have such a feature in the code you use to
> >write to persistent storage is scary.
> >
> >> As I mentioned in [Background], We must avoid lack of materials for
> >> investigation.
> >> So, kdump(KMSG_DUMP_KEXEC) is very important feature in enterprise
> >> area.
> >
> >That sounds wonderful, but it doesn't jive with the
> >code. kmsg_dump(KMSG_DUMP_KEXEC) when I read through it was simply not
> >written to be robust when most of the kernel is suspect.  Making it in
> >appropriate for use on the crash_kexec path.  I do not believe kmsg_dump
> >has seen any testing in kernel failure scenarios.
> >
> >There is this huge assumption that kmsg_dump is more reliable than
> >crash_kexec, from my review of the code kmsg_dump is simply not safe in
> >the context of a broken kernel.  The kmsg_dump code last I looked code
> >won't work if called with interrupts disabled.
> >
> >Furthermore kmsg_dump(KMSG_DUMP_KEXEC) is only useful for debugging
> >crash_kexec.  Which has it backwards as it is kmsg_dump that needs the
> >debugging.
> >
> >You just argued that it is better to corrupt the target of your
> >kmsg_dump in the event of a kernel failure instead of to fail silently.
> >
> >I don't want that unreliable code that wants to corrupt my jffs
> >partition anywhere near my machines.
> >
> >Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/