lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 27 May 2011 21:23:46 +0530
From:	"K.Prasad" <prasad@...ux.vnet.ibm.com>
To:	Andi Kleen <andi@...stfloor.org>
Cc:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	"Luck, Tony" <tony.luck@...el.com>,
	Vivek Goyal <vgoyal@...hat.com>, kexec@...ts.infradead.org,
	"Eric W. Biederman" <ebiederm@...ssion.com>, anderson@...hat.com
Subject: Re: [RFC Patch 5/6] slimdump: Capture slimdump for fatal MCE
 generated crashes

On Thu, May 26, 2011 at 07:32:57PM +0200, Andi Kleen wrote:
> On Thu, May 26, 2011 at 10:53:05PM +0530, K.Prasad wrote:
> > 
> > slimdump: Capture slimdump for fatal MCE generated crashes
> > 
> > System crashes resulting from fatal hardware errors (such as MCE) don't need
> > all the contents from crashing-kernel's memory. Generate a new 'slimdump' that
> > retains only essential information while discarding the old memory.
> 
> While this is a good idea, note there may be still poisoned lines
> in memory that haven't resulted in a machine check yet, but could
> still be fatal when read after a full crash dump for some other
> reason.
>

True, this patch does not handle the discovery of old poisoned lines/new
memory errors that may occur when inside the kdump kernel.
 
> So you still need 
> 
> http://git.kernel.org/?p=linux/kernel/git/ak/linux-mce-2.6.git;a=commit;h=fe61906edce9e70d02481a77a617ba1397573dce
> and
> http://git.kernel.org/?p=linux/kernel/git/ak/linux-mce-2.6.git;a=commit;h=cb58f049ae6709ddbab71be199390dc6852018cd
> 
> in addition.
> 
> -Andi

So, there could be (atleast) two ways to handle fatal MCEs in kdump
kernel:

- To disable MCE exceptions as done by the patches cited above. However
  the result of a read operation on corrupted memory is unknown and the
  system behaviour is undefined. We're unsure if this is a safe thing to
  do.

- To disable capture of kdump (when panic is invoked from) inside kdump
  kernel and simply reboot the system. Since the chance of memory error
  inside kdump kernel (which runs for a very short duration) is rare, I
  think this solution is preferrable.

Let me know your thoughts on this.

Thanks,
K.Prasad


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ