linux-kernel - Re: [RFC PATCH v2] cgroup: Track time in cgroup v2 freezer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6c9278b7-4eb4-4b47-b61a-a5bcc7e558b0@huaweicloud.com>
Date: Thu, 17 Jul 2025 21:52:38 +0800
From: Chen Ridong <chenridong@...weicloud.com>
To: Michal Koutný <mkoutny@...e.com>,
 Tiffany Yang <ynaffit@...gle.com>
Cc: linux-kernel@...r.kernel.org, John Stultz <jstultz@...gle.com>,
 Thomas Gleixner <tglx@...utronix.de>, Stephen Boyd <sboyd@...nel.org>,
 Anna-Maria Behnsen <anna-maria@...utronix.de>,
 Frederic Weisbecker <frederic@...nel.org>, Tejun Heo <tj@...nel.org>,
 Johannes Weiner <hannes@...xchg.org>, "Rafael J. Wysocki"
 <rafael@...nel.org>, Pavel Machek <pavel@...nel.org>,
 Roman Gushchin <roman.gushchin@...ux.dev>,
 Chen Ridong <chenridong@...wei.com>, kernel-team@...roid.com,
 Jonathan Corbet <corbet@....net>, cgroups@...r.kernel.org,
 linux-doc@...r.kernel.org
Subject: Re: [RFC PATCH v2] cgroup: Track time in cgroup v2 freezer



On 2025/7/17 20:56, Michal Koutný wrote:
> Hello Tiffany.
> 
> On Sun, Jul 13, 2025 at 10:00:09PM -0700, Tiffany Yang <ynaffit@...gle.com> wrote:
>  
>> Other sources of delay can cause similar issues, but this change focuses
>> on allowing frozen time to be accounted for in particular because of how
>> large it can grow and how unevenly it can affect applications running on
>> the system.
> 
> I'd like to incorporate the reason from your other mail:
> | Since there isn't yet a clear way to identify a set of "lost" time
> | that everyone (or at least a wider group of users) cares about, it
> | seems like iterating over components of interest is the best way 
> into this commit message (because that's a stronger ponit that your use
> case alone).
> 
> 
>> Any feedback would be much appreciated!
> 
> I can see benefits of this new stat field conceptually, I have some
> remarks to implementation and suggestions to conventions below.
> 
>> --- a/Documentation/admin-guide/cgroup-v2.rst
>> +++ b/Documentation/admin-guide/cgroup-v2.rst
>> @@ -1018,6 +1018,14 @@ All cgroup core files are prefixed with "cgroup."
>>  	it's possible to delete a frozen (and empty) cgroup, as well as
>>  	create new sub-cgroups.
>>  
>> +  cgroup.freeze.stat
> 
> With the given implementation (and use scenario), this'd better exposed
> in
>   cgroup.freeze.stat.local
> 

Would it be possible to add this field to either cgroup.event or cgroup.stat?
Since the frozen status is already tracked in cgroup.event, this placement would maintain better
cohesion with existing metrics.

This is just a suggestion.

Best regards,
Ridong

> I grok the hierarchical summing would make little sense and it'd make
> implementaion more complex. With that I'm thinking about formulation:
> 
> 	Cumulative time that cgroup has spent between freezing and
> 	thawing, regardless of whether by self or ancestor cgroups. NB
> 	(not) reaching "frozen" state is not accounted here.
> 
>> +	A read-only flat-keyed file which exists in non-root cgroups.
>> +	The following entry is defined:
>> +
>> +	  freeze_time_total_ns
>> +		Cumulative time that this cgroup has spent in the freezing
>> +		state, regardless of whether or not it reaches "frozen".
>> +
> 
> Rather use microseconds, it's the cgroup API convention and I'm not
> sure nanosecods exposed here are the needed precision.
> 
>        1    _____
> frozen 0 __/     \__
>           ab    cd
> 
> Yeah, I find the mesurent between a and c the sanest.
> 
> 
>> +static int cgroup_freeze_stat_show(struct seq_file *seq, void *v)
>> +{
>> +	struct cgroup *cgrp = seq_css(seq)->cgroup;
>> +	u64 freeze_time = 0;
>> +
>> +	spin_lock_irq(&css_set_lock);
>> +	if (test_bit(CGRP_FREEZE, &cgrp->flags))
>> +		freeze_time = ktime_get_ns() - cgrp->freezer.freeze_time_start_ns;
>> +
>> +	freeze_time += cgrp->freezer.freeze_time_total_ns;
>> +	spin_unlock_irq(&css_set_lock);
> 
> I don't like taking this spinlock only for the matter of reading this
> attribute. The intention should be to keep the (un)freezeing mostly
> unaffected at the expense of these readers (seqcount or u64 stats?).
> 
> Alternative approach: either there's outer watcher who can be notified
> by cgroup.events:frozen or it's an inner watcher who couldn't actively
> read the field anyway. So the field could only show completed
> freeze/thaw cycles from the past (i.e. not substitute clock_gettime(2)
> when the cgroup is frozen), which could simplify querying the flag too.
> 
>> @@ -5758,6 +5780,7 @@ static struct cgroup *cgroup_create(struct cgroup *parent, const char *name,
>>  	 * if the parent has to be frozen, the child has too.
>>  	 */
>>  	cgrp->freezer.e_freeze = parent->freezer.e_freeze;
>> +	cgrp->freezer.freeze_time_total_ns = 0;
> 
> struct cgroup is kzalloc'd, this is unnecessary
>