[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <dee3d594-65bb-ffa2-f009-7900a4cf3f89@yandex-team.com>
Date: Fri, 20 Aug 2021 12:37:28 +0300
From: Andrey Ryabinin <arbn@...dex-team.com>
To: Daniel Jordan <daniel.m.jordan@...cle.com>,
Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Tejun Heo <tj@...nel.org>, Zefan Li <lizefan.x@...edance.com>,
Johannes Weiner <hannes@...xchg.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>
Cc: Boris Burkov <boris@....io>,
Bharata B Rao <bharata@...ux.vnet.ibm.com>,
cgroups@...r.kernel.org, linux-kernel@...r.kernel.org,
stable@...r.kernel.org
Subject: Re: [PATCH 1/4] cputime,cpuacct: Include guest time in user time in
cpuacct.stat
Sorry for abandoning this, got distracted by lots of other stuff.
On 3/18/21 1:09 AM, Daniel Jordan wrote:
> Andrey Ryabinin <arbn@...dex-team.com> writes:
>
>> cpuacct.stat in no-root cgroups shows user time without guest time
>> included int it. This doesn't match with user time shown in root
>> cpuacct.stat and /proc/<pid>/stat.
>
> Yeah, that's inconsistent.
>
>> Make account_guest_time() to add user time to cgroup's cpustat to
>> fix this.
>
> Yep.
>
> cgroup2's cpu.stat is broken the same way for child cgroups, and this
> happily fixes it. Probably deserves a mention in the changelog.
>
Sure.
> The problem with cgroup2 was, if the workload was mostly guest time,
> cpu.stat's user and system together reflected it, but it was split
> unevenly across the two. I think guest time wasn't actually included in
> either bucket, it was just that the little user and system time there
> was got scaled up in cgroup_base_stat_cputime_show -> cputime_adjust to
> match sum_exec_runtime, which did have it.
>
> The stats look ok now for both cgroup1 and 2. Just slightly unsure
> whether we want to change the way both interfaces expose the accounting
> in case something out there depends on it. Seems like we should, but
> it'd be good to hear more opinions.
>
>> @@ -148,11 +146,11 @@ void account_guest_time(struct task_struct *p, u64 cputime)
>>
>> /* Add guest time to cpustat. */
>> if (task_nice(p) > 0) {
>> - cpustat[CPUTIME_NICE] += cputime;
>> - cpustat[CPUTIME_GUEST_NICE] += cputime;
>> + task_group_account_field(p, CPUTIME_NICE, cputime);
>> + task_group_account_field(p, CPUTIME_GUEST_NICE, cputime);
>> } else {
>> - cpustat[CPUTIME_USER] += cputime;
>> - cpustat[CPUTIME_GUEST] += cputime;
>> + task_group_account_field(p, CPUTIME_USER, cputime);
>> + task_group_account_field(p, CPUTIME_GUEST, cputime);
>> }
>
> Makes sense for _USER and _NICE, but it doesn't seem cgroup1 or 2
> actually use _GUEST and _GUEST_NICE.
>
> Could go either way. Consistency is nice, but I probably wouldn't
> change the GUEST ones so people aren't confused about why they're
> accounted. It's also extra cycles for nothing, even though most of the
> data is probably in the cache.
>
Agreed, will live the _GUEST* as is.
Powered by blists - more mailing lists