[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <4FE74F1B.6070803@corelatus.se>
Date: Sun, 24 Jun 2012 19:32:11 +0200
From: Thomas Lange <thomas@...elatus.se>
To: mingo@...hat.com, peterz@...radead.org
CC: gregkh@...uxfoundation.org, linux-kernel@...r.kernel.org
Subject: [BUG] sched: clock wrap bug in 2.6.35-stable kills scheduling
Commit 305e683 introduced a wrap bug that causes task scheduling to fail
after sched_clock() wrap. On a 1000 HZ system with 32bit jiffies, this
occurs after 49.7 days.
Bug was introduced in 2.6.35.12 and is still present in linux-2.6.35.y HEAD.
Symptoms include one task getting all available cpu time while others get
_none_. Setting niceness seems to make things even worse. Running this code
in a new process after wrap completely lock up user space, thus triggering a
watchdog reboot:
{ nice(1); while(1); }
To reproduce bug in reasonable time, one can up HZ. With 16000 HZ, bug occurs
after 3.1 days.
Modifying sched_clock() to wrap when jiffies does triggers bug after 5 mins.
The basic problem seems to be that rq->clock_task get stuck forever with a
really high value when rq->clock starts over from 0.
This fix solves that problem:
diff --git a/kernel/sched.c b/kernel/sched.c
index d40d662..883448f 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -657,6 +657,8 @@ inline void update_rq_clock(struct rq *rq)
if (!rq->skip_clock_update)
rq->clock = sched_clock_cpu(cpu_of(rq));
irq_time = irq_time_cpu(cpu);
+ if (rq->clock < rq->clock_task)
+ rq->clock_task = 0;
if (rq->clock - irq_time > rq->clock_task)
rq->clock_task = rq->clock - irq_time;
I can create a proper patch if the above is acceptable.
A more appropriate solution would perhaps be to pull some additional sched
commits into stable branch, like fe44d62 and friends. I don't know enough
about scheduler internals to tell.
All tests were performed on mips32 systems, but all systems with 32bit
jiffies should be affected.
/Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists