[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20101005085751.4490A401B2@magilla.sf.frob.com>
Date: Tue, 5 Oct 2010 01:57:51 -0700 (PDT)
From: Roland McGrath <roland@...hat.com>
To: holzheu@...ux.vnet.ibm.com
Cc: Oleg Nesterov <oleg@...hat.com>,
Martin Schwidefsky <schwidefsky@...ibm.com>,
Shailabh Nagar <nagar1234@...ibm.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Venkatesh Pallipadi <venki@...gle.com>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Suresh Siddha <suresh.b.siddha@...el.com>,
John stultz <johnstul@...ibm.com>,
Thomas Gleixner <tglx@...utronix.de>,
Balbir Singh <balbir@...ux.vnet.ibm.com>,
Ingo Molnar <mingo@...e.hu>,
Heiko Carstens <heiko.carstens@...ibm.com>,
linux-s390@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC][PATCH 09/10] taskstats: Fix exit CPU time accounting
> Thanks! That information was missing! Although still for me it not seems
> to be a good decision to do it that way. Because of that it currently is
> not possible to evaluate all consumed CPU time by looking at the current
> processes. Time can simply disappear.
I agree that it seems dubious. I don't know why that decision was made in
POSIX, but that's how it is. Anyway, POSIX only constrains what we report
in the POSIX calls, i.e. getrusage, times, waitid, SIGCHLD siginfo_t.
Nothing says we can't track more information and make it accessible in
other ways on Linux.
> What about adding a new set of CPU time fields (e.g. cr-times) for the
> cumulative "autoreap" children times to the signal struct and export
> them via taskstats?
I don't have a particular opinion about the details of how you export the
information. Something generally along those lines certainly sounds
reasonable to me.
> Then the following set of CPU times will give a complete picture (I also
> added steal time (st) that is currently not accounted in Linux per
> task):
>
> * task->(u/s/st-time):
> Time that has been consumed by task itself
There is also "gtime", "guest time" when the task is a kvm vcpu.
There is also "sched time" (task->se.sum_exec_runtime), which is
all states of task time, tracked by a different method than [usg]time.
> * task->signal->(c-u/s/st-time):
> Time that has been consumed by dead children of process where parent
> has done a sys_wait()
Also cgtime (to gtime as cutime is to utime).
> * task->signal->(u/s/st-time):
> Time that has been consumed by dead threads of thread group of process
> - NEW: Has to be exported via taskstats
Also gtime here. These are reported as part of the aggregate process times
that include both live and dead threads, but not distinguished.
> * task->signal->(cr-u/s/st-time):
> Time that has been consumed by dead children that reaped
> themselves, because parent ignored SIGCHLD or has set SA_NOCLDWAIT
> - NEW: Fields have to be added to signal struct
> - NEW: Has to be exported via taskstats
Note that there are other stats aside from times that are treated the same
way (c{min,maj}_flt, cn{v,iv}csw, c{in,ou}block, cmaxrss, and io accounting).
What probably makes sense is to move all those cfoo fields from
signal_struct into foo fields in a new struct, and then signal_struct can
have "struct child_stats reaped_children, ignored_children" or whatnot.
Thanks,
Roland
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists