linux-kernel - Re: [RFC PATCH v2] cgroup: Track time in cgroup v2 freezer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <dbx8seinan2h.fsf@ynaffit-andsys.c.googlers.com>
Date: Tue, 22 Jul 2025 15:16:22 -0700
From: Tiffany Yang <ynaffit@...gle.com>
To: "Michal Koutný" <mkoutny@...e.com>
Cc: linux-kernel@...r.kernel.org, John Stultz <jstultz@...gle.com>, 
	Thomas Gleixner <tglx@...utronix.de>, Stephen Boyd <sboyd@...nel.org>, 
	Anna-Maria Behnsen <anna-maria@...utronix.de>, Frederic Weisbecker <frederic@...nel.org>, 
	Tejun Heo <tj@...nel.org>, Johannes Weiner <hannes@...xchg.org>, 
	"Rafael J. Wysocki" <rafael@...nel.org>, Pavel Machek <pavel@...nel.org>, 
	Roman Gushchin <roman.gushchin@...ux.dev>, Chen Ridong <chenridong@...wei.com>, 
	kernel-team@...roid.com, Jonathan Corbet <corbet@....net>, cgroups@...r.kernel.org, 
	linux-doc@...r.kernel.org
Subject: Re: [RFC PATCH v2] cgroup: Track time in cgroup v2 freezer

Michal Koutný <mkoutny@...e.com> writes:

> I'd like to incorporate the reason from your other mail:
> | Since there isn't yet a clear way to identify a set of "lost" time
> | that everyone (or at least a wider group of users) cares about, it
> | seems like iterating over components of interest is the best way
> into this commit message (because that's a stronger ponit that your use
> case alone).


>> Any feedback would be much appreciated!

> I can see benefits of this new stat field conceptually, I have some
> remarks to implementation and suggestions to conventions below.

>> --- a/Documentation/admin-guide/cgroup-v2.rst
>> +++ b/Documentation/admin-guide/cgroup-v2.rst
>> @@ -1018,6 +1018,14 @@ All cgroup core files are prefixed with "cgroup."
>>   	it's possible to delete a frozen (and empty) cgroup, as well as
>>   	create new sub-cgroups.

>> +  cgroup.freeze.stat

> With the given implementation (and use scenario), this'd better exposed
> in
>    cgroup.freeze.stat.local

> I grok the hierarchical summing would make little sense and it'd make
> implementaion more complex. With that I'm thinking about formulation:

> 	Cumulative time that cgroup has spent between freezing and
> 	thawing, regardless of whether by self or ancestor cgroups. NB
> 	(not) reaching "frozen" state is not accounted here.

>> +	A read-only flat-keyed file which exists in non-root cgroups.
>> +	The following entry is defined:
>> +
>> +	  freeze_time_total_ns
>> +		Cumulative time that this cgroup has spent in the freezing
>> +		state, regardless of whether or not it reaches "frozen".
>> +

> Rather use microseconds, it's the cgroup API convention and I'm not
> sure nanosecods exposed here are the needed precision.


Ack.

>         1    _____
> frozen 0 __/     \__
>            ab    cd

> Yeah, I find the mesurent between a and c the sanest.


>> +static int cgroup_freeze_stat_show(struct seq_file *seq, void *v)
>> +{
>> +	struct cgroup *cgrp = seq_css(seq)->cgroup;
>> +	u64 freeze_time = 0;
>> +
>> +	spin_lock_irq(&css_set_lock);
>> +	if (test_bit(CGRP_FREEZE, &cgrp->flags))
>> +		freeze_time = ktime_get_ns() - cgrp->freezer.freeze_time_start_ns;
>> +
>> +	freeze_time += cgrp->freezer.freeze_time_total_ns;
>> +	spin_unlock_irq(&css_set_lock);

> I don't like taking this spinlock only for the matter of reading this
> attribute. The intention should be to keep the (un)freezeing mostly
> unaffected at the expense of these readers (seqcount or u64 stats?).


Ah, thank you for this suggestion! I noticed that none of the other
seq_file read implementations took a lock, so I thought this might be a
point of contention. I'll try a seqlock in the next version of the
patch.

> Alternative approach: either there's outer watcher who can be notified
> by cgroup.events:frozen or it's an inner watcher who couldn't actively
> read the field anyway. So the field could only show completed
> freeze/thaw cycles from the past (i.e. not substitute clock_gettime(2)
> when the cgroup is frozen), which could simplify querying the flag too.


This is a good observation. This approach does simplify things, but
even though it would work for our use case, I feel like this value
would be less useful for the outer watcher case, especially in the case
where the cgroup never reaches the frozen state.

>> @@ -5758,6 +5780,7 @@ static struct cgroup *cgroup_create(struct cgroup  
>> *parent, const char *name,
>>   	 * if the parent has to be frozen, the child has too.
>>   	 */
>>   	cgrp->freezer.e_freeze = parent->freezer.e_freeze;
>> +	cgrp->freezer.freeze_time_total_ns = 0;

> struct cgroup is kzalloc'd, this is unnecessary

Thank you for all your feedback! I'll make sure to incorporate these
suggestions into the next version.

-- 
Tiffany Y. Yang