[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <m1pqqqfpzh.fsf@fess.ebiederm.org>
Date: Thu, 17 Feb 2011 10:57:54 -0800
From: ebiederm@...ssion.com (Eric W. Biederman)
To: Ingo Molnar <mingo@...e.hu>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Michal Hocko <mhocko@...e.cz>, linux-mm@...ck.org,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: BUG: Bad page map in process udevd (anon_vma: (null)) in 2.6.38-rc4
Ingo Molnar <mingo@...e.hu> writes:
> * Linus Torvalds <torvalds@...ux-foundation.org> wrote:
>
>> And in addition, I don't see why others wouldn't see it (I've got
>> DEBUG_PAGEALLOC and SLUB_DEBUG_ON turned on myself, and I know others
>> do too).
>
> I've done extensive randconfig testing and no crash triggers for typical workloads
> on a typical dual-core PC. If there's a generic crashes in there my tests tend to
> trigger them at least 10x as often as regular testers ;-) But the tests are still
> only statistical so the race could simply be special and missed by the tests.
>
>> So I'm wondering what triggers it. Must be something subtle.
>
> I think what Michal did before he got the corruption seemed somewhat atypical:
> suspend/resume and udevd wifi twiddling, right?
>
> Now, Eric's crashes look similar - and he does not seem to have done anything
> special to trigger the crashes.
>
> Eric, could you possibly describe your system in a bit more detail, does it do
> suspend and does the box use wifi actively? Anything atypical in your setup or usage
> that doesnt match a bog-standard whitebox PC with LAN? Swap to file? NFS? FUSE?
> Anything that is even just borderline atypical.
10G RAM
2G Swap
dual socket system
4 cores per socket
No hyperthreading.
fedora 14
ext4 on all filesystems
The biggest difference is I beat the system to death with automated builds.
I was about to say this happens with DEBUG_PAGEALLOC enabled but it
appears that options keeps eluding my fingers when I have a few minutes
to play with it. Perhaps this time will be the charm.
The biggest difference may be that I am constantly stressing the system
to the edge of triggering the OOM killer. My builds and tests are
greedy when it comes to memory.
I guess also I only see the bad PMD on processes that exit. So it may
be that it is a matter of timing to see it.
Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists