[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <4EDF0CEB.80904@orcon.net.nz>
Date: Wed, 07 Dec 2011 19:51:23 +1300
From: Michael Cree <mcree@...on.net.nz>
To: linux-kernel@...r.kernel.org
CC: linux-alpha@...r.kernel.org, Shaohua Li <shaohua.li@...el.com>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Richard Henderson <rth@...ddle.net>,
Ivan Kokshaysky <ink@...assic.park.msu.ru>,
Matt Turner <mattst88@...il.com>
Subject: rcu_sched_state detected stalls on Alpha with generic config
I am seeing "rcu_sched_state detected stall on CPU" messages on Alpha
architecture with a generic SMP config. Interactive tasks are seen to
lock up, with "INFO: task X blocked for more than 120 seconds" in the
kernel logs, and eventual kernel oops and panic, on latest 3.2-rc4 and
traceable back to 3.0. Bisection between 2.6.39 and 3.0 leads to commit:
09223371deac67d08ca0b70bd18787920284c967
rcu: Use softirq to address performance regression
as the first bad commit.
Tested on an Alpha ES45 (Titan) with three 1.25 GHz CPUs and 4 GByte
memory. Testing procedure is to build git software and run its test
suite with -j4 in the make command argument.
The CPU stall messages and eventually system lockup is only seen with a
generic Alpha config, never with a Titan machine specific config.
An example of kernel logs is (this one probably produced when I tried to
shutdown the system when it is falling over):
[45360.930876] INFO: rcu_sched_state detected stall on CPU 1 (t=798848
jiffies)
[45360.931853] INFO: rcu_sched_state detected stalls on CPUs/tasks: { 1}
(detected by 0, t=798850 jiffies)
[45489.080225] INFO: task umount:17371 blocked for more than 120 seconds.
[45489.158350] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[45489.252100] umount D fffffc00013461ac 0 17371 17368
0x00000000
[45489.336084] fffffc00fdd53db8 fffffc00fdd97bb8 fffffc000108ca1c
fffffc00dcc9e800
[45489.422998] fffffc00dcc9e810 fffffc00013b3a5d fffffc000106289c
fffffc00ff0dfda8
[45489.519678] 0000000000000000 fffffc000108c81c fffffc0001cd73f0
0000000000000001
[45489.615381] fffffc00010627f0 0000000000000000 fffffc00dcc9e920
fffffc00ff0bf780
[45489.712060] fffffc00010111b8 fffffc00ff0dfda8 fffffc00ff0dfde8
fffffc0001cdaa58
[45489.808740] 0000000000000000 0000000000000000 fffffc0000000000
fffffc0000000000
[45489.907373] Trace:
[45489.930810] [<fffffc000108ca1c>] watchdog+0x200/0x27c
[45489.991357] [<fffffc000106289c>] kthread+0xac/0xc4
[45490.048974] [<fffffc000108c81c>] watchdog+0x0/0x27c
[45490.107568] [<fffffc00010627f0>] kthread+0x0/0xc4
[45490.164209] [<fffffc00010111b8>] kernel_thread+0x28/0x90
[45490.227685]
Let me know if any other information is needed to narrow down the problem.
Cheers
Michael.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists