[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1430319487-23216-1-git-send-email-ethan.zhao@oracle.com>
Date: Wed, 29 Apr 2015 23:58:07 +0900
From: Ethan Zhao <ethan.zhao@...cle.com>
To: tglx@...utronix.de, john.stultz@...aro.org,
linux-kernel@...r.kernel.org
Cc: ethan.kernel@...il.com, tim.uglow@...cle.com,
Ethan Zhao <ethan.zhao@...cle.com>
Subject: [PATCH] Timer: fix a race condition between init_timers_cpu() and get_next_timer_interrupt()
while testing CPU hotplug and MCE with following two scripts,
script 1:
for i in {1..30}; do while :; do ((a=$RANDOM%160)); echo 0 >>
/sys/devices/system/cpu/cpu${i}/online; echo 1 >>
/sys/devices/system/cpu/cpu${i}/online; done & done
script 2:
while :; do for i in $(ls
/sys/devices/system/machinecheck/machinecheck*/check_interval); do echo 1 >>
$i; done; done
We got panic call trace as:
sh> bt
PID: 0 TASK: ffff881028e28080 CPU: 14 COMMAND: "swapper/14"
#0 [ffff881028e2ba90] machine_kexec at ffffffff810402aa
#1 [ffff881028e2baf8] crash_kexec at ffffffff810c4ea4
#2 [ffff881028e2bbc0] oops_end at ffffffff81575c50
#3 [ffff881028e2bbe8] no_context at ffffffff8156a8f9
#4 [ffff881028e2bc30] __bad_area_nosemaphore at ffffffff8156a979
#5 [ffff881028e2bc78] bad_area_nosemaphore at ffffffff8156aae3
#6 [ffff881028e2bc88] __do_page_fault at ffffffff8157852d
#7 [ffff881028e2bd80] do_page_fault at ffffffff8157896e
#8 [ffff881028e2bd90] page_fault at ffffffff815750d8
[exception RIP: get_next_timer_interrupt+344]
RIP: ffffffff8106c228 RSP: ffff881028e2be48 RFLAGS: 00010013
RAX: 0000000000000000 RBX: 00000001006d7a29 RCX: 00000001006d7dee
RDX: 0000000000000001 RSI: ffff88602978d3f8 RDI: ffff88602978d028
RBP: ffff881028e2be90 R8: 000000000000003d R9: 000000000000003b
R10: 0000000001006d7b R11: ffff881028e2be50 R12: 00000001006d7dee
R13: 00000001406d7a28 R14: ffff88602978c000 R15: ffff881028e2be68
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#9 [ffff881028e2be40] get_next_timer_interrupt at ffffffff8106c120
This panic (NULL pointer dereference) was caused by race condition between
init_timers_cpu() and get_next_timer_interrupt(), there is no protection
with lock when does initialization in function init_timers_cpu().
The two threads cause the race condition are shown as following:
Thread A:
store_online()
cpu_up()
__cpu_notify(CPU_UP_PREPARE...)
timer_cpu_notify()
CPU_UP_PREPARE: init_timers_cpu()
{
...
INIT_LIST_HEAD(base->tv5.vec + j);
...
}
Thread B:
tick_nohz_idle_enter()
__tick_nohz_idle_enter()
get_next_timer_interrupt()
__next_timer_interrupt()
{
...
list_for_each_entry(nte, varp->vec + slot, entry) {
if (tbase_get_deferrable(nte->base))
...
}
This bug will affect stable branch 4.0, 3.8, 3.19, I didn't check other branches.
The patch has been tested and verfied on stable 4.0.
Reported-by: Tim Uglow <tim.uglow@...cle.com>
Signed-off-by: Ethan Zhao <ethan.zhao@...cle.com>
---
kernel/time/timer.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 2d3f5c5..cc1cf35 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1573,6 +1573,9 @@ static int init_timers_cpu(int cpu)
base = per_cpu(tvec_bases, cpu);
}
+ if (unlikely(!base))
+ return -EINVAL;
+ spin_lock_irq(&base->lock);
for (j = 0; j < TVN_SIZE; j++) {
INIT_LIST_HEAD(base->tv5.vec + j);
@@ -1587,6 +1590,7 @@ static int init_timers_cpu(int cpu)
base->next_timer = base->timer_jiffies;
base->active_timers = 0;
base->all_timers = 0;
+ spin_unlock_irq(&base->lock);
return 0;
}
--
1.8.3.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists