[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.0904230935060.24310@gandalf.stny.rr.com>
Date: Thu, 23 Apr 2009 09:53:52 -0400 (EDT)
From: Steven Rostedt <rostedt@...dmis.org>
To: Ingo Molnar <mingo@...e.hu>
cc: Frédéric Weisbecker <fweisbec@...il.com>,
LKML <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Glauber de Oliveira Costa <gcosta@...hat.com>,
Chris Wright <chrisw@...s-sol.org>,
Jeremy Fitzhardinge <jeremy@...p.org>,
Rusty Russell <rusty@...tcorp.com.au>,
Pekka Enberg <penberg@...helsinki.fi>
Subject: Re: [PATCH 0/2] [GIT PULL] tracing: various bug fixes
On Thu, 23 Apr 2009, Ingo Molnar wrote:
> >
> > Here's the situation:
> >
> > We've added selftests for the event tracer. What this basically does is
> > enables each event one at a time and runs tests. The tests include
> > creating a kernel thread, executing workqueues and grabbing locks.
> >
> > We also have PROVE_LOCKING (LOCKDEP) enabled, that keeps track of
> > interrupts being enabled. When they are, we set a flag in the task struct
> > "hardirqs_enabled". When they are disabled, this flag is cleared.
> >
> > When we fork a process, a test is made to see if the flag is set for the
> > new process and if it is not, a warning is printed (as is done in the
> > above dump).
> >
> > I investigated this and found that the flag is mysteriously being set and
> > cleared for no apparent reason. The flag is in the task struct and nothing
> > should be touching it. In fact, it is a full int, not even a bit in a
> > flags variable.
> >
> > I had a test that would print the flag and irqs_disabled() in copy_process
> > before the warning and it showed that the flag was cleared but irqs was
> > enabled. The funny part is, I if the test triggered, I printed the flag
> > again, and the second print it was set again!!
> >
> > if (!p->hardirqs_enabled) {
> > printk("irqs:%d flag:%d\n", irqs_disabled(),
> > p->hardirqs_enabled);
> > printk("try again: %d\n", p->hardirqs_enabled);
> > }
> >
> > The first print showed that it was cleared, the second showed it was set
> > again??
>
> function tracer was active? So somewhere there we corrupted this
> state? It's unlikely that printk itself did this.
The problem always arises at the same spot. I'm not saying printk was the
culprit, I'm saying that printk actually "fixed" the issue. Which can be a
sign of a corrupted register somewhere.
I currently tracked it down to something in "prep_new_page". It is hard to
debug because as I add tests into the code, it makes the race window
smaller, and I need to run multiple boots to trigger the code. But when I
do trigger it (and I try to trigger it a few times) it always happens at
the same spot.
Note, everytime I catch the issue, the printk again "fixes" the problem
:-/
Since I need both DEBUG_PAGEALLOC and PARAVIRT on, I'm thinking that the
debug page alloc might be messing with the paravirt "modified" code (or
vice versa). It could also be the function tracer doing it too. But it
looks like the DEBUG_PAGEALLOC code is causing issues because the bug
always appears in the alloc code.
I'm not blaming any subsystem yet. It just seems that the combination of
these systems are broken.
-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists