[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5C4C569E8A4B9B42A84A977CF070A35B2C147F43B7@USINDEVS01.corp.hds.com>
Date: Thu, 3 Feb 2011 17:08:01 -0500
From: Seiji Aguchi <seiji.aguchi@....com>
To: "Eric W. Biederman" <ebiederm@...ssion.com>
CC: Vivek Goyal <vgoyal@...hat.com>,
KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
linux kernel mailing list <linux-kernel@...r.kernel.org>,
Jarod Wilson <jwilson@...hat.com>
Subject: RE: Query about kdump_msg hook into crash_kexec()
Hi Eric,
Thank you for your prompt reply.
I would like to consider "Needs in enterprise area" and "Implementation of kmsg_dump()" separately.
(1) Needs in enterprise area
In case of kdump failure, we would like to store kernel buffer to NVRAM/flush memory
for detecting root cause of kernel crash.
(2) Implementation of kmsg_dump
You suggest to review/test cording of kmsg_dump() more.
What do you think about (1)?
Is it acceptable for you?
Seiji
>-----Original Message-----
>From: Eric W. Biederman [mailto:ebiederm@...ssion.com]
>Sent: Thursday, February 03, 2011 4:13 PM
>To: Seiji Aguchi
>Cc: Vivek Goyal; KOSAKI Motohiro; linux kernel mailing list; Jarod Wilson
>Subject: Re: Query about kdump_msg hook into crash_kexec()
>
>Seiji Aguchi <seiji.aguchi@....com> writes:
>
>> Hi,
>>
>>>PS: FWIW, Hitach folks have usage idea for their enterprise purpose, but
>>> unfortunately I don't know its detail. I hope anyone tell us it.
>>
>> I explain the usage of kmsg_dump(KMSG_DUMP_KEXEC) in enterprise area.
>>
>> [Background]
>> In our support service experience, we always need to detect root cause
>> of OS panic.
>> So, customers in enterprise area never forgive us if kdump fails and
>> we can't detect the root cause of panic due to lack of materials for
>> investigation.
>>
>>>- Why do you need a notification from inside crash_kexec(). IOW, what
>>> is the usage of KMSG_DUMP_KEXEC.
>>
>>
>> The usage of kdump(KMSG_DUMP_KEXEC) in enterprise area is getting
>> useful information for investigating kernel crash in case kdump
>> kernel doesn't boot.
>>
>> Kdump kernel may not start booting because there is a sha256 checksum
>> verified over the kdump kernel before it starts booting.
>> This means kdump kernel may fail even if there is no bug in kdump and
>> we can't get any information for detecting root cause of kernel crash
>
>Sure it is theoretically possible that the sha256 checksum gets
>corrupted (I have never seen it happen or heard reports of it
>happening). It is a feature that if someone has corrupted your code the
>code doesn't try and run anyway and corrupt anything else.
>
>That you are arguing against have such a feature in the code you use to
>write to persistent storage is scary.
>
>> As I mentioned in [Background], We must avoid lack of materials for
>> investigation.
>> So, kdump(KMSG_DUMP_KEXEC) is very important feature in enterprise
>> area.
>
>That sounds wonderful, but it doesn't jive with the
>code. kmsg_dump(KMSG_DUMP_KEXEC) when I read through it was simply not
>written to be robust when most of the kernel is suspect. Making it in
>appropriate for use on the crash_kexec path. I do not believe kmsg_dump
>has seen any testing in kernel failure scenarios.
>
>There is this huge assumption that kmsg_dump is more reliable than
>crash_kexec, from my review of the code kmsg_dump is simply not safe in
>the context of a broken kernel. The kmsg_dump code last I looked code
>won't work if called with interrupts disabled.
>
>Furthermore kmsg_dump(KMSG_DUMP_KEXEC) is only useful for debugging
>crash_kexec. Which has it backwards as it is kmsg_dump that needs the
>debugging.
>
>You just argued that it is better to corrupt the target of your
>kmsg_dump in the event of a kernel failure instead of to fail silently.
>
>I don't want that unreliable code that wants to corrupt my jffs
>partition anywhere near my machines.
>
>Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists