[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4AF26176.4080307@jp.fujitsu.com>
Date: Thu, 05 Nov 2009 14:24:06 +0900
From: Hidetoshi Seto <seto.hidetoshi@...fujitsu.com>
To: Spencer Candland <spencer@...ehost.com>
CC: linux-kernel@...r.kernel.org,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Ingo Molnar <mingo@...e.hu>
Subject: Re: utime/stime decreasing on thread exit
Hidetoshi Seto wrote:
> Spencer Candland wrote:
>> I am seeing a problem with utime/stime decreasing on thread exit in a
>> multi-threaded process. I have been able to track this regression down
>> to the "process wide cpu clocks/timers" changes introduces in
>> 2.6.29-rc5, specifically when I revert the following commits I know
>> longer see decreasing utime/stime values:
>>
>> 4da94d49b2ecb0a26e716a8811c3ecc542c2a65d
>> 3fccfd67df79c6351a156eb25a7a514e5f39c4d9
>> 7d8e23df69820e6be42bcc41d441f4860e8c76f7
>> 4cd4c1b40d40447fb5e7ba80746c6d7ba91d7a53
>> 32bd671d6cbeda60dc73be77fa2b9037d9a9bfa0
(snip)
>
> I'll check commits you pointed/reverted.
These are likely series of:
> commit 4cd4c1b40d40447fb5e7ba80746c6d7ba91d7a53
> Author: Peter Zijlstra <a.p.zijlstra@...llo.nl>
> Date: Thu Feb 5 12:24:16 2009 +0100
>
> timers: split process wide cpu clocks/timers
This results in making sys_times() to use "clocks" instead of
"timers". Please refer the description of the above commit.
I found 2 problems through my review.
Problem [1]:
thread_group_cputime() vs exit
+void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
+{
+ struct sighand_struct *sighand;
+ struct signal_struct *sig;
+ struct task_struct *t;
+
+ *times = INIT_CPUTIME;
+
+ rcu_read_lock();
+ sighand = rcu_dereference(tsk->sighand);
+ if (!sighand)
+ goto out;
+
+ sig = tsk->signal;
+
+ t = tsk;
+ do {
+ times->utime = cputime_add(times->utime, t->utime);
+ times->stime = cputime_add(times->stime, t->stime);
+ times->sum_exec_runtime += t->se.sum_exec_runtime;
+
+ t = next_thread(t);
+ } while (t != tsk);
+
+ times->utime = cputime_add(times->utime, sig->utime);
+ times->stime = cputime_add(times->stime, sig->stime);
+ times->sum_exec_runtime += sig->sum_sched_runtime;
+out:
+ rcu_read_unlock();
+}
If one of (thousands) threads do exit while a thread is doing do-while
above, the s/utime of exited thread can be accounted twice, at do-while
(before exit) and at cputime_add() at last (after exit).
I suppose this is hard to fix: Taking lock on signal would solve this
problem, but it could block all other threads long and cause serious
performance issue and so on...
Problem [2]:
use of task_s/utime()
I modified the test program more, to take times() 6 times and print them
if utime decreased between 3rd and 4th.
I noticed that I cannot explain that if the problem [1] was the root cause
then why results show decreased value continuously, instead of an increased
value at a point (like (v)(v)(V)(v)(v)(v)) which is expected.
:
times decreased : (104 984) (104 984) (104 984) (105 983) (105 983) (105 983)
times decreased : (115 981) (116 980) (116 978) (117 977) (117 977) (119 979)
times decreased : (116 980) (117 980) (117 980) (117 977) (118 979) (118 977)
:
And it seems that the more thread exits the more utime decreases.
Soon I found:
[kernel/exit.c]
+ sig->utime = cputime_add(sig->utime, task_utime(tsk));
+ sig->stime = cputime_add(sig->stime, task_stime(tsk));
While the thread_group_cputime() accumulates raw s/utime in do-while loop,
the signal struct accumulates adjusted s/utime of exited threads.
I'm not sure how this adjustment works but applying the following patch
makes the result little bit better:
:
times decreased : (436 741) (436 741) (437 744) (436 742) (436 742) (436 742)
times decreased : (454 792) (454 792) (455 794) (454 792) (454 792) (454 792)
times decreased : (503 941) (503 941) (504 943) (503 941) (503 941) (503 941)
:
But still decreasing(or increasing) continues, because there is a problem [1]
at least.
I think I couldn't handle this problem any more... Anybody can help?
Thanks,
H.Seto
===
Subject: [PATCH] thread_group_cputime() should use task_s/utime()
The signal struct accumulates adjusted cputime of exited threads,
so thread_group_cputime() should use task_s/utime() instead of raw
task->s/utime, to accumulate adjusted cputime of live threads.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@...fujitsu.com>
---
kernel/posix-cpu-timers.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index 5c9dc22..e065b8a 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -248,8 +248,8 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
t = tsk;
do {
- times->utime = cputime_add(times->utime, t->utime);
- times->stime = cputime_add(times->stime, t->stime);
+ times->utime = cputime_add(times->utime, task_utime(t));
+ times->stime = cputime_add(times->stime, task_stime(t));
times->sum_exec_runtime += t->se.sum_exec_runtime;
t = next_thread(t);
--
1.6.5.2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists