[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1b6498f3-ca07-41d5-9637-f20a58184e60@huaweicloud.com>
Date: Sat, 23 Aug 2025 09:45:26 +0800
From: Chen Ridong <chenridong@...weicloud.com>
To: Tiffany Yang <ynaffit@...gle.com>
Cc: linux-kernel@...r.kernel.org, John Stultz <jstultz@...gle.com>,
Thomas Gleixner <tglx@...utronix.de>, Stephen Boyd <sboyd@...nel.org>,
Anna-Maria Behnsen <anna-maria@...utronix.de>,
Frederic Weisbecker <frederic@...nel.org>, Tejun Heo <tj@...nel.org>,
Johannes Weiner <hannes@...xchg.org>, Michal Koutný
<mkoutny@...e.com>, "Rafael J. Wysocki" <rafael@...nel.org>,
Pavel Machek <pavel@...nel.org>, Roman Gushchin <roman.gushchin@...ux.dev>,
Chen Ridong <chenridong@...wei.com>, kernel-team@...roid.com,
Jonathan Corbet <corbet@....net>, Shuah Khan <shuah@...nel.org>,
cgroups@...r.kernel.org, linux-doc@...r.kernel.org,
linux-kselftest@...r.kernel.org
Subject: Re: [PATCH v4 1/2] cgroup: cgroup.stat.local time accounting
On 2025/8/23 3:32, Tiffany Yang wrote:
> Hi Chen,
>
> Thanks again for taking a look!
>
> Chen Ridong <chenridong@...weicloud.com> writes:
>
>> On 2025/8/22 14:14, Chen Ridong wrote:
>
>
>>> On 2025/8/22 9:37, Tiffany Yang wrote:
>>>> There isn't yet a clear way to identify a set of "lost" time that
>>>> everyone (or at least a wider group of users) cares about. However,
>>>> users can perform some delay accounting by iterating over components of
>>>> interest. This patch allows cgroup v2 freezing time to be one of those
>>>> components.
>
>>>> Track the cumulative time that each v2 cgroup spends freezing and expose
>>>> it to userland via a new local stat file in cgroupfs. Thank you to
>>>> Michal, who provided the ASCII art in the updated documentation.
>
>>>> To access this value:
>>>> $ mkdir /sys/fs/cgroup/test
>>>> $ cat /sys/fs/cgroup/test/cgroup.stat.local
>>>> freeze_time_total 0
>
>>>> Ensure consistent freeze time reads with freeze_seq, a per-cgroup
>>>> sequence counter. Writes are serialized using the css_set_lock.
>
> ...
>
>>>> spin_lock_irq(&css_set_lock);
>>>> - if (freeze)
>>>> + write_seqcount_begin(&cgrp->freezer.freeze_seq);
>>>> + if (freeze) {
>>>> set_bit(CGRP_FREEZE, &cgrp->flags);
>>>> - else
>>>> + cgrp->freezer.freeze_start_nsec = ts_nsec;
>>>> + } else {
>>>> clear_bit(CGRP_FREEZE, &cgrp->flags);
>>>> + cgrp->freezer.frozen_nsec += (ts_nsec -
>>>> + cgrp->freezer.freeze_start_nsec);
>>>> + }
>>>> + write_seqcount_end(&cgrp->freezer.freeze_seq);
>>>> spin_unlock_irq(&css_set_lock);
>
>
>>> Hello Tiffany,
>
>>> I wanted to check if there are any specific considerations regarding how we should input the ts_nsec
>>> value.
>
>>> Would it be possible to define this directly within the cgroup_do_freeze function rather than
>>> passing it as a parameter? This approach might simplify the implementation and potentially improve
>>> timing accuracy when it have lots of descendants.
>
>
>> I revisited v3, and this was Michal's point.
>> p
>> / | \
>> 1 ... n
>> When we freeze the parent group p, is it expected that all descendant cgroups (1 to n) should share
>> the same frozen timestamp?
>
>
> Yes, this is the expectation from the current change. I understand your
> concern about the accuracy of this measurement (especially when there
> are many descendants), but I agree with Michal's point that the time to
> traverse the descendant cgroups is basically noise relative to the
> quantity we're trying to measure here.
>
>> If the cgroup tree structure is stable, the exact frozen time may not be really matter. However, if
>> the tree is not stable, obtaining the same frozen time is acceptable?
>
> I'm a little unclear as to what you mean about when the cgroup tree is
> unstable. In the case where a new descendant of p is being created, I
> believe the cgroup_mutex prevents that from happening at the same time
> as we are freezing p's other descendants. If it won the race, was
> created unfrozen under p, and then became frozen during cgroup_freeze,
> it would have the same timestamp as the other descendants. If it lost
> the race and was created as a frozen cgroup under p, it would get its
> own timestamp in cgroup_create, so its freezing duration would be
> slightly less than that of the others in the hierarchy. Both values
> would be acceptable for our purposes, but if there was a different case
> you had in mind, please let me know!
>
> Thanks,
What I mean by "stable" is that while cgroup 1 through n might be deleted or have more descendants
created. For example:
n n-1 n-2 ... 1
frozen a a+1 a+2 a+n
unfozen b b+1 b+2 ... b+n
nsec b-a ...
In this case, all frozen_nsec values are b - a, which I believe is correct.
However, consider a scenario where some cgroups are deleted:
n n-1 n-2 ... 1
frozen a a+1 a+2 a+n
// 2 ... n-1 are deleted.
unfozen b b+1
Here, the frozen_nsec for cgroup n would be b - a, but for cgroup 1 it would be (b + 1) - (a + n).
This could introduce some discrepancy / timing inaccuracies.
--
Best regards,
Ridong
Powered by blists - more mailing lists