lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <dbx8ms7r885f.fsf@ynaffit-andsys.c.googlers.com>
Date: Fri, 22 Aug 2025 12:32:12 -0700
From: Tiffany Yang <ynaffit@...gle.com>
To: Chen Ridong <chenridong@...weicloud.com>
Cc: linux-kernel@...r.kernel.org, John Stultz <jstultz@...gle.com>, 
	Thomas Gleixner <tglx@...utronix.de>, Stephen Boyd <sboyd@...nel.org>, 
	Anna-Maria Behnsen <anna-maria@...utronix.de>, Frederic Weisbecker <frederic@...nel.org>, 
	Tejun Heo <tj@...nel.org>, Johannes Weiner <hannes@...xchg.org>, 
	"Michal Koutný" <mkoutny@...e.com>, "Rafael J. Wysocki" <rafael@...nel.org>, Pavel Machek <pavel@...nel.org>, 
	Roman Gushchin <roman.gushchin@...ux.dev>, Chen Ridong <chenridong@...wei.com>, 
	kernel-team@...roid.com, Jonathan Corbet <corbet@....net>, Shuah Khan <shuah@...nel.org>, 
	cgroups@...r.kernel.org, linux-doc@...r.kernel.org, 
	linux-kselftest@...r.kernel.org
Subject: Re: [PATCH v4 1/2] cgroup: cgroup.stat.local time accounting

Hi Chen,

Thanks again for taking a look!

Chen Ridong <chenridong@...weicloud.com> writes:

> On 2025/8/22 14:14, Chen Ridong wrote:


>> On 2025/8/22 9:37, Tiffany Yang wrote:
>>> There isn't yet a clear way to identify a set of "lost" time that
>>> everyone (or at least a wider group of users) cares about. However,
>>> users can perform some delay accounting by iterating over components of
>>> interest. This patch allows cgroup v2 freezing time to be one of those
>>> components.

>>> Track the cumulative time that each v2 cgroup spends freezing and expose
>>> it to userland via a new local stat file in cgroupfs. Thank you to
>>> Michal, who provided the ASCII art in the updated documentation.

>>> To access this value:
>>>    $ mkdir /sys/fs/cgroup/test
>>>    $ cat /sys/fs/cgroup/test/cgroup.stat.local
>>>    freeze_time_total 0

>>> Ensure consistent freeze time reads with freeze_seq, a per-cgroup
>>> sequence counter. Writes are serialized using the css_set_lock.

...

>>>   	spin_lock_irq(&css_set_lock);
>>> -	if (freeze)
>>> +	write_seqcount_begin(&cgrp->freezer.freeze_seq);
>>> +	if (freeze) {
>>>   		set_bit(CGRP_FREEZE, &cgrp->flags);
>>> -	else
>>> +		cgrp->freezer.freeze_start_nsec = ts_nsec;
>>> +	} else {
>>>   		clear_bit(CGRP_FREEZE, &cgrp->flags);
>>> +		cgrp->freezer.frozen_nsec += (ts_nsec -
>>> +			cgrp->freezer.freeze_start_nsec);
>>> +	}
>>> +	write_seqcount_end(&cgrp->freezer.freeze_seq);
>>>   	spin_unlock_irq(&css_set_lock);


>> Hello Tiffany,

>> I wanted to check if there are any specific considerations regarding how  
>> we should input the ts_nsec
>> value.

>> Would it be possible to define this directly within the cgroup_do_freeze  
>> function rather than
>> passing it as a parameter? This approach might simplify the  
>> implementation and potentially improve
>> timing accuracy when it have lots of descendants.


> I revisited v3, and this was Michal's point.
> 	p
>       /  |  \
>      1  ...  n
> When we freeze the parent group p, is it expected that all descendant  
> cgroups (1 to n) should share
> the same frozen timestamp?


Yes, this is the expectation from the current change. I understand your
concern about the accuracy of this measurement (especially when there
are many descendants), but I agree with Michal's point that the time to
traverse the descendant cgroups is basically noise relative to the
quantity we're trying to measure here.

> If the cgroup tree structure is stable, the exact frozen time may not be  
> really matter. However, if
> the tree is not stable, obtaining the same frozen time is acceptable?

I'm a little unclear as to what you mean about when the cgroup tree is
unstable. In the case where a new descendant of p is being created, I
believe the cgroup_mutex prevents that from happening at the same time
as we are freezing p's other descendants. If it won the race, was
created unfrozen under p, and then became frozen during cgroup_freeze,
it would have the same timestamp as the other descendants. If it lost
the race and was created as a frozen cgroup under p, it would get its
own timestamp in cgroup_create, so its freezing duration would be
slightly less than that of the others in the hierarchy. Both values
would be acceptable for our purposes, but if there was a different case
you had in mind, please let me know!

Thanks,
-- 
Tiffany Y. Yang

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ