[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100325121250.GA3664@redhat.com>
Date: Thu, 25 Mar 2010 13:12:50 +0100
From: Oleg Nesterov <oleg@...hat.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Americo Wang <xiyou.wangcong@...il.com>,
Balbir Singh <balbir@...ibm.com>,
"Eric W. Biederman" <ebiederm@...ssion.com>,
Hidetoshi Seto <seto.hidetoshi@...fujitsu.com>,
Ingo Molnar <mingo@...e.hu>,
Roland McGrath <roland@...hat.com>,
Spencer Candland <spencer@...ehost.com>,
Stanislaw Gruszka <sgruszka@...hat.com>,
linux-kernel@...r.kernel.org
Subject: Re: [RFC,PATCH 2/2] cputimers/proc:
do_task_stat()->thread_group_times() is racy and O(n) under
->siglock
On 03/24, Peter Zijlstra wrote:
>
> On Wed, 2010-03-24 at 21:45 +0100, Oleg Nesterov wrote:
> > Nowadays ->siglock is overloaded, it would be really nice to change
> > do_task_stat() to walk through the list of threads lockless. And note
> > that we are doing while_each_thread() twice!
> >
> > while_each_thread() is rcu-safe, but thread_group_times() also needs
> > ->siglock to serialize the modifications of signal_struct->prev_Xtime
> > members.
First of all, let me reply to myself. I see that I wasn't clear at all.
This patch does the first step to remove one reason for ->siglock
(modification of ->prev_Xtime). But this is very minor, I guess we
could change thread_group_times() to take signal->cputimer->lock.
The goal was to call thread_group_cputime() lockless under rcu lock
(either directly, or via thread_group_times(), this doesn't matter)
to avoid while_each_thread() under ->siglock.
And in this case /proc/pid/stat can't report utime/stime atomically.
Whatever we do we can race with exit, so it doesn't make sense to
play with ->prev_Xtime.
> Right, so from what I remember the issue is that, yes top et al rely on
> that monotonicity,
Really? So, do you think the change above will break user-space?
How sad :/
> but more importantly I think
> clock_gettime(CLOCK_PROCESS_CPUTIME_ID) should indeed use ->siglock to
> ensure it serializes against do_exit() so that either we iterate the
> thread or get the accumulated runtime from signal_struct but not both
> (or neither).
Oh. I forgot everything I knew about posix-cpu-timers... But, it seems,
posix_cpu_clock_get() calls thread_group_cputime() under tasklist and
thus can't race with exit.
Oleg.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists