lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.1.10.0804211759030.2779@woody.linux-foundation.org>
Date:	Mon, 21 Apr 2008 18:14:09 -0700 (PDT)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	"Rafael J. Wysocki" <rjw@...k.pl>
cc:	Jiri Slaby <jirislaby@...il.com>, paulmck@...ux.vnet.ibm.com,
	David Miller <davem@...emloft.net>,
	linux-kernel@...r.kernel.org, mingo@...e.hu,
	akpm@...ux-foundation.org, linux-ext4@...r.kernel.org,
	herbert@...dor.apana.org.au,
	Zdenek Kabelac <zdenek.kabelac@...il.com>
Subject: Re: 2.6.25-git2: BUG: unable to handle kernel paging request at
 ffffffffffffffff



On Tue, 22 Apr 2008, Rafael J. Wysocki wrote:
> > 
> > The same place, dentry.d_hash.next is 1. No slub debug clues... I think, I'll 
> > give slab a try. Any other clues?
> 
> Well, SLUB uses some per CPU data structures.  Is it possible that they get
> corrupted and which leads to the observed symptoms?

It really doesn't look like the slub allocations themselves would be 
corrupted. It very much looks like wild pointers corrupting allocations 
that themselves were fine.

The nybble pattern looked intriguing (especially as it apparently also hit 
a normal page cache page!) but obviously not everything matches that 
pattern (eg your value of 1).

What do you do to trigger this? Any particular load? Is it still just 
doing suspend/resume, or do you have something else that you are playing 
with?

Also, have you tried CONFIG_DEBUG_PAGEALLOC? That can also be a very 
powerful way to find memory corruption.

Does anybody see any other patterns? Looking at the modules linked in in 
the oopses from Zdenek, Rafael and Jiri, I don't see anything odd. You 
both all have 80211 support, maybe the corruption comes from the wireless 
layer?

Or maybe it's the x86 code changes themselves, and it really is about the 
suspend/resume sequence itself. Are all the people who see this doing 
suspends? 

		Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ