linux-kernel - Re: 2.6.25-git2: BUG: unable to handle kernel paging request at ffffffffffffffff

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LFD.1.10.0804211759030.2779@woody.linux-foundation.org>
Date:	Mon, 21 Apr 2008 18:14:09 -0700 (PDT)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	"Rafael J. Wysocki" <rjw@...k.pl>
cc:	Jiri Slaby <jirislaby@...il.com>, paulmck@...ux.vnet.ibm.com,
	David Miller <davem@...emloft.net>,
	linux-kernel@...r.kernel.org, mingo@...e.hu,
	akpm@...ux-foundation.org, linux-ext4@...r.kernel.org,
	herbert@...dor.apana.org.au,
	Zdenek Kabelac <zdenek.kabelac@...il.com>
Subject: Re: 2.6.25-git2: BUG: unable to handle kernel paging request at
 ffffffffffffffff

On Tue, 22 Apr 2008, Rafael J. Wysocki wrote:
> > 
> > The same place, dentry.d_hash.next is 1. No slub debug clues... I think, I'll 
> > give slab a try. Any other clues?
> 
> Well, SLUB uses some per CPU data structures.  Is it possible that they get
> corrupted and which leads to the observed symptoms?

It really doesn't look like the slub allocations themselves would be 
corrupted. It very much looks like wild pointers corrupting allocations 
that themselves were fine.

The nybble pattern looked intriguing (especially as it apparently also hit 
a normal page cache page!) but obviously not everything matches that 
pattern (eg your value of 1).

What do you do to trigger this? Any particular load? Is it still just 
doing suspend/resume, or do you have something else that you are playing 
with?

Also, have you tried CONFIG_DEBUG_PAGEALLOC? That can also be a very 
powerful way to find memory corruption.

Does anybody see any other patterns? Looking at the modules linked in in 
the oopses from Zdenek, Rafael and Jiri, I don't see anything odd. You 
both all have 80211 support, maybe the corruption comes from the wireless 
layer?

Or maybe it's the x86 code changes themselves, and it really is about the 
suspend/resume sequence itself. Are all the people who see this doing 
suspends? 

		Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/