[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <51B7690C.8090805@linaro.org>
Date: Tue, 11 Jun 2013 11:14:36 -0700
From: John Stultz <john.stultz@...aro.org>
To: Oleg Nesterov <oleg@...hat.com>
CC: Andrew Morton <akpm@...ux-foundation.org>,
"Eric W. Biederman" <ebiederm@...ssion.com>,
Tomas Janousek <tjanouse@...hat.com>,
Tomas Smetana <tsmetana@...hat.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 0/3] de_thread() should update ->real_start_time
On 06/11/2013 10:13 AM, Oleg Nesterov wrote:
> On 06/10, John Stultz wrote:
>
>>> simply change copy_process
>>>
>>> - do_posix_clock_monotonic_gettime(&p->start_time);
>>> + get_monotonic_boottime(&p->start_time);
>>>
>>> ?
>>>
>>> Afaics, this will only affect do_acct_process() and bacct_add_tsk(),
>>> but do we really want to exclude the suspended time in this case?
>> So bacct_add_tsk seems easy to change, since its just:
>> do_posix_clock_monotonic_gettime(&uptime);
>> ts = timespec_sub(uptime, tsk->start_time);
>>
>> So grabbing the monotonic boot time for uptime would provide the same
>> relative delta.
> Not really, or I misunderstood monotonic/boottime interaction.
>
> IIUC, monotonic doesn't grow during suspend, so the delta can grow if
> we use get_monotonic_boottime() in copy_process() and bacct_add_tsk()
> and the system was suspended in between. Right?
Oh right. Good point. The suspend time may not be constant between the
calculations.
> But perhaps this is fine and even more correct?
Hrmm.. Looking closer at what the calculations are used for, I worry
changing to counting suspend time in elapsed run time would be a
userland visible behaivor change that might be problematic.
That said, elapsed run time as it exists now is not really a useful
measurement, since you get different results depending on the situation:
ie, a VM that doesn't get much cpu time vs a system that suspends
frequently. In one case, you seem to have been running for a long time,
but not getting much cpu runtime, where as the other you might appear to
get quite a bit of the possible execution time.
This all goes back to issues around what suspend-state really is. Where
in previously it was viewed to be user-controlled and considered closer
to the system temporarily being off - thus the intent was to make
suspend invisible/hidden to the system itself, but more recently, with
systems suspending quite frequently suspend state is more
invisible/hidden to the end user, and is closer to a deep idle state.
Back in the day folks were up in arms that someone could "cheat" their
way to large uptime values by leaving their system suspended. But, if I
could do it again, I'd probably push for CLOCK_MONOTONIC (as exported to
userland) and all of these user-visible metrics to include suspend time.
So I think it probably *makes more sense* to include suspend_time in the
elapsed runtime value being exported via bacct_add_tsk() and
do_acct_process(), but I unfortunately worry now any such change would
risk breaking userland expectations.
The *actual* risk may be quite minor, so this could be one of those:
"Let the tree fall and if no one is there to hear it, fine" interface
breaks, but I'm not sure I'm eager enough to be the one proposing it. :)
thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists