[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20061221211242.GG16860@austin.ibm.com>
Date: Thu, 21 Dec 2006 15:12:42 -0600
From: linas@...tin.ibm.com (Linas Vepstas)
To: Ingo Molnar <mingo@...hat.com>
Cc: Anton Blanchard <anton@...ba.org>,
Benjamin Herrenschmidt <benh@...nel.crashing.org>,
linux-kernel@...r.kernel.org, linuxppc-dev@...abs.org,
mingo@...e.hu
Subject: Re: Mutex debug lock failure [was Re: Bad gcc-4.1.0 leads to Power4 crashes... and power5 too, actually
On Thu, Dec 21, 2006 at 03:41:39PM +0100, Ingo Molnar wrote:
> On Wed, 2006-12-20 at 19:03 -0600, Linas Vepstas wrote:
> > Same kernel runs fine on power5. Although it does have patches
> > applied, those very same patches boot fine when applied to a slightly
> > older kernel (2.6.19-rc4). I haven't been messing with buids or
> > pci config space (at least not intentionaly).
> >
> > I'll try again with an unpatched, unmodified kernel.
>
> there have been a number of fixes to lockdep recently - could you try
> the kernel/lockdep.c file from latest -mm, does that fail too?
>
> one possibility would be a chain-hash collision.
I see the same problem on linux-2.6.20-rc1-mm1
The patch below fixes this, although I don't understand why
this has become an issue just now:
Index: linux-2.6.20-rc1-mm1/kernel/mutex.c
===================================================================
--- linux-2.6.20-rc1-mm1.orig/kernel/mutex.c 2006-12-19
16:19:34.000000000 -0600
+++ linux-2.6.20-rc1-mm1/kernel/mutex.c 2006-12-21 14:31:33.000000000
-0600
@@ -249,7 +249,7 @@ __mutex_unlock_common_slowpath(atomic_t
wake_up_process(waiter->task);
}
- debug_mutex_clear_owner(lock);
+ // debug_mutex_clear_owner(lock);
spin_unlock_mutex(&lock->wait_lock, flags);
}
It obvious that this is the proximal cause of the failure of
the double_unlock_mutex() mutex self-test. However, both
the double-unlock test, and this clear_owner() call, are
in linux-2.6.19-git7, which doesn't fail this test. So I conclude
that __mutex_unlock_common_slowpath() is never taken in 2.6.19
but is always taken on 2.6.20-rc1 (in particular, is taken
during the double-unlock test).
I don't know why that would be.
It might be wise to add a test to make sure the slowpath
is taken only when it should be taken? Its sort of scary
to think that it might be always taken, and that no one
notices the problem...
I'm gonna be out until after Christmas. -- and so,
Merry Christmas!
--linas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists