[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <491BC4B8.1050406@colorfullife.com>
Date: Thu, 13 Nov 2008 07:10:00 +0100
From: Manfred Spraul <manfred@...orfullife.com>
To: Andrew Morton <akpm@...ux-foundation.org>
CC: cboulte@...il.com, Nadia.Derbey@...l.net,
linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...e.hu>
Subject: Re: [PATCH] SYSVIPC - Fix the ipc structures initialization
Andrew Morton wrote:
> Time is starting to press on this one. Is there something which we can
> revert which would fix this bug?
>
My previous analysis was bogus, let's start from scratch:
1) the initial oops report:
http://bugzilla.kernel.org/show_bug.cgi?id=11796#c0
- lockdep is enabled, the oops is somewhere in __lock_acquire
- the instruction that oopses is
>>> lock incl 0x138(%r12)
R12 is 0x0038004000000000
That could be an debug_atomic_inc() in __lock_acquire. The class pointer
in the spinlock_t is not initialized, thus it crashes.
Ingo - is that possible?
2) the latest oops was actually a soft lockup:
It starts with:
> [ 400.393024] INFO: trying to register non-static key.
> [ 400.397005] the code is fine but needs lockdep annotation.
> [ 400.397005] turning off the locking correctness validator.
> [ 400.397005] Pid: 4207, comm: sysv_test2 Not tainted 2.6.27-ipc_lock #1
> [ 400.397005] Call Trace:
> [ 400.397005] [<ffffffff80257055>] static_obj+0x60/0x77
> [ 400.397005] [<ffffffff8025af59>] __lock_acquire+0x1c8/0x779
> [ 400.397005] [<ffffffff8025b59f>] lock_acquire+0x95/0xc2
> [ 400.397005] [<ffffffff802feb07>] ipc_lock+0x62/0x99
> [ 400.397005] [<ffffffff8045117d>] _spin_lock+0x2d/0x5a
> [ 400.397005] [<ffffffff802feb07>] ipc_lock+0x62/0x99
> [ 400.397005] [<ffffffff802feb07>] ipc_lock+0x62/0x99
> [ 400.397005] [<ffffffff802feaa5>] ipc_lock+0x0/0x99
> [ 400.397005] [<ffffffff802feb46>] ipc_lock_check+0x8/0x53
> [ 400.397005] [<ffffffff803002c3>] sys_msgctl+0x188/0x461
> [ 400.397005] [<ffffffff80259ac7>] trace_hardirqs_on_caller+0x100/0x12a
> [ 400.397005] [<ffffffff80450d49>] trace_hardirqs_on_thunk+0x3a/0x3f
> [ 400.397005] [<ffffffff80259ac7>] trace_hardirqs_on_caller+0x100/0x12a
> [ 400.397005] [<ffffffff80212e09>] sched_clock+0x5/0x7
> [ 400.397005] [<ffffffff80450d49>] trace_hardirqs_on_thunk+0x3a/0x3f
> [ 400.397005] [<ffffffff80213021>] native_sched_clock+0x8c/0xa5
> [ 400.397005] [<ffffffff80212e09>] sched_clock+0x5/0x7
> [ 400.397005] [<ffffffff8020bf7a>] system_call_fastpath+0x16/0x1b
> [ 400.397005]
> [ 464.933003] BUG: soft lockup - CPU#2 stuck for 61s! [sysv_test2:4207]
> [ 464.933006] Call Trace:
> [ 464.933006] [<ffffffff8033dc6b>] _raw_spin_lock+0x98/0x100
> [ 464.933006] [<ffffffff8045119e>] _spin_lock+0x4e/0x5a
> [ 464.933006] [<ffffffff802feb07>] ipc_lock+0x62/0x99
For me, it reads like an uninitialized spinlock_t:
The static_obj test in kernel/lockdep.c notices that something is wrong and disables itself.
But then _raw_spin_lock() tries to acquire the uninitialized spinlock and loops forever, because noone does spin_unlock().
after 60 seconds, the soft lockup detection notices the problem and oopses.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists