linux-kernel - Re: PROBLEM: kernel BUG at mm/slab.c:3002!

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.64.0903102148150.31262@blonde.anvils>
Date:	Tue, 10 Mar 2009 22:05:06 +0000 (GMT)
From:	Hugh Dickins <hugh@...itas.com>
To:	"Alan D. Brunelle" <Alan.Brunelle@...com>
cc:	Matt Mackall <mpm@...enic.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	cl@...ux-foundation.org, penberg@...helsinki.fi, linux-mm@...ck.org
Subject: Re: PROBLEM: kernel BUG at mm/slab.c:3002!

On Tue, 10 Mar 2009, Alan D. Brunelle wrote:
> Matt Mackall wrote:
> > On Tue, 2009-03-10 at 13:29 -0400, Alan D. Brunelle wrote:
> >> Matt Mackall wrote:
> >>> On Tue, 2009-03-10 at 11:16 -0400, Alan D. Brunelle wrote:
> >>>> Running blktrace & I/O loads cause a kernel BUG at mm/slab.c:3002!.
> >>> Pid: 11346, comm: blktrace Tainted: G    B      2.6.29-rc7 #3 ProLiant
> >>> DL585 G5   
> >>>
> >>> That 'B' there indicates you've hit 'bad page' before this. That bug
> >>> seems to be strongly correlated with some form of hardware trouble.
> >>> Unfortunately, that makes everything after that point a little suspect.
> >>
> >> /If/ it were a hardware issue, that might explain the subsequent issue
> >> when I switched to SLUB instead...
> > 
> > Well it was almost certainly not a bug in SLAB itself (and your SLUB
> > test is obviously quite conclusive there). We'd have lots of reports.
> > It's probably too early to conclude it's hardware though.
> > 
> >> How does one look for "bad page reports"?
> > 
> > It'll look something like this (pasted from Google):
> > 
> >>>     kernel: Bad page state at free_hot_cold_page (in process 'beam',
> >>> page c1a95320)
> >>>     kernel: flags:0x40020118 mapping:f401adc0 mapped:0 count:0
> >>> private:0x00000000
> > 
> 
> Interestingly enough, I'm not seeing the kernel detect such things - but
> in going into the hardware server logs, a co-worker found "unrecoverable
> system errors" being detected at about the same times we're seeing the
> panics.

In 2.6.29-rc, the "B" taint should be associated with mm/page_alloc.c's
bad_page() KERN_ALERT "BUG: Bad page state in process %s  pfn:%05lx\n",
but it could also now come from mm/memory.c's print_bad_pte()
KERN_ALERT "BUG: Bad page map in process %s  pte:%08llx pmd:%08llx\n",
which replaces the old mm/rmap.c Eeeks, and some other cases too.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/