lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090612133352.GC6751@localhost>
Date:	Fri, 12 Jun 2009 21:33:52 +0800
From:	Wu Fengguang <fengguang.wu@...el.com>
To:	Ingo Molnar <mingo@...e.hu>
Cc:	Thomas Gleixner <tglx@...utronix.de>,
	"H. Peter Anvin" <hpa@...or.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Andrew Morton <akpm@...ux-foundation.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Nick Piggin <npiggin@...e.de>,
	Hugh Dickins <hugh.dickins@...cali.co.uk>,
	Andi Kleen <andi@...stfloor.org>,
	"riel@...hat.com" <riel@...hat.com>,
	"chris.mason@...cle.com" <chris.mason@...cle.com>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH 1/5] HWPOISON: define VM_FAULT_HWPOISON to 0 when
	feature is disabled

On Fri, Jun 12, 2009 at 09:17:54PM +0800, Ingo Molnar wrote:
> 
> * Wu Fengguang <fengguang.wu@...el.com> wrote:
> 
> > Hi Ingo,
> > 
> > On Fri, Jun 12, 2009 at 07:22:58PM +0800, Ingo Molnar wrote:
> > > 
> > > * Wu Fengguang <fengguang.wu@...el.com> wrote:
> > > 
> > > > So as to eliminate one #ifdef in the c source.
> > > > 
> > > > Proposed by Nick Piggin.
> > > > 
> > > > CC: Nick Piggin <npiggin@...e.de>
> > > > Signed-off-by: Wu Fengguang <fengguang.wu@...el.com>
> > > > ---
> > > >  arch/x86/mm/fault.c |    3 +--
> > > >  include/linux/mm.h  |    7 ++++++-
> > > >  2 files changed, 7 insertions(+), 3 deletions(-)
> > > > 
> > > > --- sound-2.6.orig/arch/x86/mm/fault.c
> > > > +++ sound-2.6/arch/x86/mm/fault.c
> > > > @@ -819,14 +819,13 @@ do_sigbus(struct pt_regs *regs, unsigned
> > > >  	tsk->thread.error_code	= error_code;
> > > >  	tsk->thread.trap_no	= 14;
> > > >  
> > > > -#ifdef CONFIG_MEMORY_FAILURE
> > > >  	if (fault & VM_FAULT_HWPOISON) {
> > > >  		printk(KERN_ERR
> > > >  	"MCE: Killing %s:%d due to hardware memory corruption fault at %lx\n",
> > > >  			tsk->comm, tsk->pid, address);
> > > >  		code = BUS_MCEERR_AR;
> > > >  	}
> > > > -#endif
> > > 
> > > Btw., anything like this should happen in close cooperation with 
> > > the x86 tree, not as some pure MM feature. I dont see Cc:s and 
> > > nothing that indicates that realization. What's going on here?
> > 
> > Ah sorry for the ignorance!  Andi has a nice overview of the big 
> > picture here: http://lkml.org/lkml/2009/6/3/371
> > 
> > In the above chunk, the process is trying to access the already 
> > corrupted page and thus shall be killed, otherwise it will either 
> > silently consume corrupted data, or will trigger another (deadly) 
> > MCE event and bring down the whole machine.
> 
> This seems like trying to handle a failure mode that cannot be and 
> shouldnt be 'handled' really. If there's an 'already corrupted' page 
> then the box should go down hard and fast, and we should not risk 
> _even more user data corruption_ by trying to 'continue' in the hope 
> of having hit some 'harmless' user process that can be killed ...
> 
> So i find the whole feature rather dubious - what's the point? We 
> should panic at this point - we just corrupted user data so that 
> piece of hardware cannot be trusted. Nor can any subsequent kernel 
> bug messages be trusted.
> 
> Do we really want this in the core Linux VM and in the architecture 
> pagefault handling code and elsewhere? Am i the only one who finds 
> this concept of 'handling' user data corruption rather dubious?

- The corrupted data only impacts one or more process(es)
- The corrupted data has not be consumed yet

The data corruption has not caused real hurt yet, and can be isolated
to prevent future accesses.  So it makes sense to just kill the
impacted process(es).

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ