lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4CD1D919.5000209@google.com>
Date:	Wed, 03 Nov 2010 14:50:17 -0700
From:	Aaron Durbin <adurbin@...gle.com>
To:	Seiji Aguchi <seiji.aguchi@....com>
CC:	Andrew Morton <akpm@...ux-foundation.org>,
	"simon.kagstrom@...insight.net" <simon.kagstrom@...insight.net>,
	"David.Woodhouse@...el.com" <David.Woodhouse@...el.com>,
	"anders.grafstrom@...insight.net" <anders.grafstrom@...insight.net>,
	"Artem.Bityutskiy@...ia.com" <Artem.Bityutskiy@...ia.com>,
	"kosaki.motohiro@...fujitsu.com" <kosaki.motohiro@...fujitsu.com>,
	"jason.wessel@...driver.com" <jason.wessel@...driver.com>,
	"jslaby@...e.cz" <jslaby@...e.cz>,
	"jmorris@...ei.org" <jmorris@...ei.org>,
	"eparis@...hat.com" <eparis@...hat.com>, "hch@....de" <hch@....de>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"dle-develop@...ts.sourceforge.net" 
	<dle-develop@...ts.sourceforge.net>, "Satoru Moriya"@google.com
Subject: Re: [RFC][Patch] Adding kmsg_dump() to reboot/halt/poweroff/emergency_restart
 path

On 10/27/10 12:44, Seiji Aguchi wrote:
> Hi,
>
>> What actual problem are we solving here?  Why is the current code
>> inadequate?  It would help to demonstrate some use-case and to explain
>> how the situation improved with this patch.
>
> [Purpose]
>   My purpose is developing highly reliable logging facility for
enterprise use.
>
>   I'm planning to add the following triggers of kmsg_dumper().
>      - reboot/poweroff/halt/emergency_restart (this patch)
>      - Machine check
>
>   I'm also planning to add an feature outputting kernel messages to
NVRAM,
>   because NVRAM is equipped with enterprise servers.
>   We can realize highly reliable logging facility by outputting
kernel messages to NVRAM.
>   (NVRAM is commonly used on Mainframe and Commercial Unix as well.)
>
> [Use case of reboot/poweroff/halt/emergency_restart]
>
>   My company has often experienced the followings in our support service.
>   - Customer's system suddenly reboots.
>   - Customers ask us to investigate the reason of the reboot.
>
>   We recognize the fact itself because boot messages remain in
/var/log/messages.
>   However, we can't investigate the reason why the system rebooted,
>   because the last messages don't remain.
>   And off course we can't explain the reason.
>
>
>   We can solve above problem with this patch as follows.
>   Case1: reboot with command
>     - We can see "Restarting system with command:" or ""Restarting
system.".
>
>   Case2: halt with command
>     - We can see "System halted.".
>
>   Case3: poweroff with command
>     - We can see " Power down.".
>
>   Case4: emergency_restart with sysrq.
>     - We can see "Sysrq:" outputted in __handle_sysrq().
>
>   Case5: emergency_restart with softdog.
>     - We can see "Initiating system reboot" in watchdog_fire().
>
>   So, we can distinguish the reason of reboot, poweroff, halt and
emergency_restart.
>
>   If customer executed reboot command, you may think the customer
should know the fact.
>   However, they often claim they don't execute the command when they
rebooted system by mistake.
>
>   No evidential message remain on current Linux kernel, so we can't
show the proof to the customer.
>   This patch improves this situation.
>
> Seiji

We carry patches in our kernels that do very similar things. The reason 
is essentially the same as what you have cited. On our platforms we have 
two different ways of storing events to an event log. One communicates 
with the BIOS itself; the other writes bit flags to a known area of 
non-volatile storage. That way when the machine comes back up we have a 
clear eventlog (with times) as to what happened when. Piecing these 
events together has proven to be invaluable for finding issues.

For both of the drivers that log these events they use a shared 
interface that collect various events in the kernel and present them 
through a single notifier chain for the drivers' consumption.

The things we currently track and log are the following:
- clean reboot/shutdown
- panic
- oops
- die
- NMI watchdog

An example eventlog produced by our systems looks like the following 
(63-67 are the boot numbers of the system in question):

2010-10-14 10:26:06 | System Reset | 63
2010-10-14 10:26:19 | System boot | 63
2010-10-14 11:36:43 | Kernel Shutdown | 63 | Unknown Shutdown Reason
2010-10-14 11:36:43 | System Reset | 64
2010-10-14 11:36:56 | System boot | 64
2010-10-18 14:51:54 | Kernel Shutdown | 64 | Clean
2010-10-18 14:52:38 | System Reset | 65
2010-10-18 14:52:51 | System boot | 65
2010-10-26 02:44:48 | Kernel Shutdown | 65 | Oops
2010-10-26 02:44:48 | Kernel Shutdown | 65 | Die
2010-10-26 02:44:49 | Kernel Shutdown | 65 | Panic
2010-10-26 02:45:43 | System Reset | 66
2010-10-26 02:45:56 | System boot | 66
2010-10-26 02:49:22 | Kernel Shutdown | 66 | Clean
2010-10-26 02:50:05 | System Reset | 67
2010-10-26 02:50:18 | System boot | 67
2010-10-26 11:39:20 | Kernel Shutdown | 67 | Clean

Hope that helps others know that we think such a mechansim is vital. I 
can post the patches for the common infrastructure if people are interested.

-Aaron
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ