lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20111005124113.GB509@gere.osrc.amd.com>
Date:	Wed, 5 Oct 2011 14:41:14 +0200
From:	Borislav Petkov <bp@...en8.de>
To:	"K.Prasad" <prasad@...ux.vnet.ibm.com>
Cc:	"Eric W. Biederman" <ebiederm@...ssion.com>,
	linux-kernel@...r.kernel.org, crash-utility@...hat.com,
	kexec@...ts.infradead.org, Vivek Goyal <vgoyal@...hat.com>,
	Andi Kleen <andi@...stfloor.org>,
	"Luck, Tony" <tony.luck@...el.com>, anderson@...hat.com,
	tachibana@....nes.nec.co.jp, oomichi@....nes.nec.co.jp
Subject: Re: [Patch 1/4][kernel][slimdump] Add new elf-note of type
 NT_NOCOREDUMP to capture slimdump

On Wed, Oct 05, 2011 at 03:17:27PM +0530, K.Prasad wrote:
> We don't want to capture memory dump when the machine crashes due to
> faulty cache, because the end-user derives no benefit by receiving a
> bulky vmcore and running crash analysis tools over them. Instead a
> 'slimdump' that contains a meaningful message about the origin of crash
> (and which can be understood by his analysis tools) would be better, or
> so I thought.

Ok, this makes sense, a meaningful message along with the MCE decoded
properly in userfriendly language so that one can understand why the
system has not captured vmcore.

> There are possibly several hardware errors which cause system crash and
> the kdump would capture full vmcore, although it doesn't make sense (I
> wouldn't have cared about the second example, you cited, if they did not
> generate MCE, but a different exception). In an ideal situation, each of
> these error paths would 'subscribe' to slimdump and add a meaningful
> message in the NT_NOCOREDUMP note instead of letting the user-space copy
> the old kernel memory.

Yep, I see.

> Fine with me. I see that the various IA32_MCi_Status registers will hold
> information about the error and use that to classify MCEs.
> 
> I think the best way to go about is to retain NT_NOCOREDUMP for non-DRAM
> errors also, but use the note-name field in the elf-note and distinguish the
> various types of errors...say, by using names such as "PANIC_MCE_DRAM",
> "PANIC_MCE_CACHE", etc (similar to the error codes described in the Intel
> manual). The upstream tools like 'makedumpfile' and 'crash' will have to
> be taught to parse the elf-note name and act accordingly.

Right, so Valdis had the right question in the other mail, let me
generalize it here: does it ever make sense to save vmcore on a hardware
error?

With DRAM errors, you probably could use the additional info coming with
the MCE do decode to the physical address and map back to the DIMM and
swap it. Any other use cases?

Thanks.

-- 
Regards/Gruss,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ