[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1257778154.4108.341.camel@laptop>
Date: Mon, 09 Nov 2009 15:49:14 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Hidetoshi Seto <seto.hidetoshi@...fujitsu.com>
Cc: Spencer Candland <spencer@...ehost.com>,
linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...e.hu>,
Oleg Nesterov <oleg@...hat.com>
Subject: Re: utime/stime decreasing on thread exit
On Thu, 2009-11-05 at 14:24 +0900, Hidetoshi Seto wrote:
> Problem [1]:
> thread_group_cputime() vs exit
>
> +void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
> +{
> + struct sighand_struct *sighand;
> + struct signal_struct *sig;
> + struct task_struct *t;
> +
> + *times = INIT_CPUTIME;
> +
> + rcu_read_lock();
> + sighand = rcu_dereference(tsk->sighand);
> + if (!sighand)
> + goto out;
> +
> + sig = tsk->signal;
> +
> + t = tsk;
> + do {
> + times->utime = cputime_add(times->utime, t->utime);
> + times->stime = cputime_add(times->stime, t->stime);
> + times->sum_exec_runtime += t->se.sum_exec_runtime;
> +
> + t = next_thread(t);
> + } while (t != tsk);
> +
> + times->utime = cputime_add(times->utime, sig->utime);
> + times->stime = cputime_add(times->stime, sig->stime);
> + times->sum_exec_runtime += sig->sum_sched_runtime;
> +out:
> + rcu_read_unlock();
> +}
>
> If one of (thousands) threads do exit while a thread is doing do-while
> above, the s/utime of exited thread can be accounted twice, at do-while
> (before exit) and at cputime_add() at last (after exit).
>
> I suppose this is hard to fix: Taking lock on signal would solve this
> problem, but it could block all other threads long and cause serious
> performance issue and so on...
I just checked .22 and there we seem to hold p->sighand->siglock over
the full task iteration. So we might as well revert back to that if
people really mind counting things twice :-)
FWIW getrusage() also takes siglock over the task iteration.
Alternatively, we could try reading the sig->[us]time before doing the
loop, but I guess that's still racy in that we can then miss someone
altogether.
> Problem [2]:
> use of task_s/utime()
>
> I modified the test program more, to take times() 6 times and print them
> if utime decreased between 3rd and 4th.
> I noticed that I cannot explain that if the problem [1] was the root cause
> then why results show decreased value continuously, instead of an increased
> value at a point (like (v)(v)(V)(v)(v)(v)) which is expected.
>
> :
> times decreased : (104 984) (104 984) (104 984) (105 983) (105 983) (105 983)
> times decreased : (115 981) (116 980) (116 978) (117 977) (117 977) (119 979)
> times decreased : (116 980) (117 980) (117 980) (117 977) (118 979) (118 977)
> :
>
> And it seems that the more thread exits the more utime decreases.
>
> Soon I found:
>
> [kernel/exit.c]
> + sig->utime = cputime_add(sig->utime, task_utime(tsk));
> + sig->stime = cputime_add(sig->stime, task_stime(tsk));
>
> While the thread_group_cputime() accumulates raw s/utime in do-while loop,
> the signal struct accumulates adjusted s/utime of exited threads.
>
> I'm not sure how this adjustment works but applying the following patch
> makes the result little bit better:
>
> :
> times decreased : (436 741) (436 741) (437 744) (436 742) (436 742) (436 742)
> times decreased : (454 792) (454 792) (455 794) (454 792) (454 792) (454 792)
> times decreased : (503 941) (503 941) (504 943) (503 941) (503 941) (503 941)
> :
>
> But still decreasing(or increasing) continues, because there is a problem [1]
> at least.
>
> I think I couldn't handle this problem any more... Anybody can help?
Stick in a few trace_printk()s and see what happens?
> Subject: [PATCH] thread_group_cputime() should use task_s/utime()
>
> The signal struct accumulates adjusted cputime of exited threads,
> so thread_group_cputime() should use task_s/utime() instead of raw
> task->s/utime, to accumulate adjusted cputime of live threads.
>
> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@...fujitsu.com>
> ---
> kernel/posix-cpu-timers.c | 4 ++--
> 1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
> index 5c9dc22..e065b8a 100644
> --- a/kernel/posix-cpu-timers.c
> +++ b/kernel/posix-cpu-timers.c
> @@ -248,8 +248,8 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
>
> t = tsk;
> do {
> - times->utime = cputime_add(times->utime, t->utime);
> - times->stime = cputime_add(times->stime, t->stime);
> + times->utime = cputime_add(times->utime, task_utime(t));
> + times->stime = cputime_add(times->stime, task_stime(t));
> times->sum_exec_runtime += t->se.sum_exec_runtime;
>
> t = next_thread(t);
So what you're trying to say is that because __exit_signal() uses
task_[usg]time() to accumulate sig->[usg]time, we should use it too in
the loop over the live threads?
I'm thinking its the task_[usg]time() usage in __exit_signal() that's
the issue.
I tried running the modified test.c on a current -tip kernel but could
not observe the problem (dual-core opteron).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists