lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <552a7f82-2735-47a5-9abd-a9ae845f4961@huaweicloud.com>
Date: Fri, 22 Aug 2025 14:14:27 +0800
From: Chen Ridong <chenridong@...weicloud.com>
To: Tiffany Yang <ynaffit@...gle.com>, linux-kernel@...r.kernel.org
Cc: John Stultz <jstultz@...gle.com>, Thomas Gleixner <tglx@...utronix.de>,
 Stephen Boyd <sboyd@...nel.org>,
 Anna-Maria Behnsen <anna-maria@...utronix.de>,
 Frederic Weisbecker <frederic@...nel.org>, Tejun Heo <tj@...nel.org>,
 Johannes Weiner <hannes@...xchg.org>, Michal Koutný
 <mkoutny@...e.com>, "Rafael J. Wysocki" <rafael@...nel.org>,
 Pavel Machek <pavel@...nel.org>, Roman Gushchin <roman.gushchin@...ux.dev>,
 Chen Ridong <chenridong@...wei.com>, kernel-team@...roid.com,
 Jonathan Corbet <corbet@....net>, Shuah Khan <shuah@...nel.org>,
 cgroups@...r.kernel.org, linux-doc@...r.kernel.org,
 linux-kselftest@...r.kernel.org
Subject: Re: [PATCH v4 1/2] cgroup: cgroup.stat.local time accounting



On 2025/8/22 9:37, Tiffany Yang wrote:
> There isn't yet a clear way to identify a set of "lost" time that
> everyone (or at least a wider group of users) cares about. However,
> users can perform some delay accounting by iterating over components of
> interest. This patch allows cgroup v2 freezing time to be one of those
> components.
> 
> Track the cumulative time that each v2 cgroup spends freezing and expose
> it to userland via a new local stat file in cgroupfs. Thank you to
> Michal, who provided the ASCII art in the updated documentation.
> 
> To access this value:
>   $ mkdir /sys/fs/cgroup/test
>   $ cat /sys/fs/cgroup/test/cgroup.stat.local
>   freeze_time_total 0
> 
> Ensure consistent freeze time reads with freeze_seq, a per-cgroup
> sequence counter. Writes are serialized using the css_set_lock.
> 
> Signed-off-by: Tiffany Yang <ynaffit@...gle.com>
> Cc: Tejun Heo <tj@...nel.org>
> Cc: Michal Koutný <mkoutny@...e.com>
> ---
> v3 -> v4:
> * Replace "freeze_time_total" with "frozen" and expose stats via
>   cgroup.stat.local, as recommended by Tejun.
> * Use the same timestamp when freezing/unfreezing a cgroup as its
>   descendants, as suggested by Michal.
> 
> v2 -> v3:
> * Use seqcount along with css_set_lock to guard freeze time accesses, as
>   suggested by Michal.
> 
> v1 -> v2:
> * Track per-cgroup freezing time instead of per-task frozen time, as
>   suggested by Tejun.
> ---
>  Documentation/admin-guide/cgroup-v2.rst | 18 ++++++++++++++++
>  include/linux/cgroup-defs.h             | 17 +++++++++++++++
>  kernel/cgroup/cgroup.c                  | 28 +++++++++++++++++++++++++
>  kernel/cgroup/freezer.c                 | 16 ++++++++++----
>  4 files changed, 75 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index 51c0bc4c2dc5..a1e3d431974c 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -1001,6 +1001,24 @@ All cgroup core files are prefixed with "cgroup."
>  		Total number of dying cgroup subsystems (e.g. memory
>  		cgroup) at and beneath the current cgroup.
>  
> +  cgroup.stat.local
> +	A read-only flat-keyed file which exists in non-root cgroups.
> +	The following entry is defined:
> +
> +	  frozen_usec
> +		Cumulative time that this cgroup has spent between freezing and
> +		thawing, regardless of whether by self or ancestor groups.
> +		NB: (not) reaching "frozen" state is not accounted here.
> +
> +		Using the following ASCII representation of a cgroup's freezer
> +		state, ::
> +
> +			       1    _____
> +			frozen 0 __/     \__
> +			          ab    cd
> +
> +		the duration being measured is the span between a and c.
> +
>    cgroup.freeze
>  	A read-write single value file which exists on non-root cgroups.
>  	Allowed values are "0" and "1". The default is "0".
> diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
> index 6b93a64115fe..539c64eeef38 100644
> --- a/include/linux/cgroup-defs.h
> +++ b/include/linux/cgroup-defs.h
> @@ -433,6 +433,23 @@ struct cgroup_freezer_state {
>  	 * frozen, SIGSTOPped, and PTRACEd.
>  	 */
>  	int nr_frozen_tasks;
> +
> +	/* Freeze time data consistency protection */
> +	seqcount_t freeze_seq;
> +
> +	/*
> +	 * Most recent time the cgroup was requested to freeze.
> +	 * Accesses guarded by freeze_seq counter. Writes serialized
> +	 * by css_set_lock.
> +	 */
> +	u64 freeze_start_nsec;
> +
> +	/*
> +	 * Total duration the cgroup has spent freezing.
> +	 * Accesses guarded by freeze_seq counter. Writes serialized
> +	 * by css_set_lock.
> +	 */
> +	u64 frozen_nsec;
>  };
>  
>  struct cgroup {
> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> index 312c6a8b55bb..ab096b884bbc 100644
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -3763,6 +3763,27 @@ static int cgroup_stat_show(struct seq_file *seq, void *v)
>  	return 0;
>  }
>  
> +static int cgroup_core_local_stat_show(struct seq_file *seq, void *v)
> +{
> +	struct cgroup *cgrp = seq_css(seq)->cgroup;
> +	unsigned int sequence;
> +	u64 freeze_time;
> +
> +	do {
> +		sequence = read_seqcount_begin(&cgrp->freezer.freeze_seq);
> +		freeze_time = cgrp->freezer.frozen_nsec;
> +		/* Add in current freezer interval if the cgroup is freezing. */
> +		if (test_bit(CGRP_FREEZE, &cgrp->flags))
> +			freeze_time += (ktime_get_ns() -
> +					cgrp->freezer.freeze_start_nsec);
> +	} while (read_seqcount_retry(&cgrp->freezer.freeze_seq, sequence));
> +
> +	seq_printf(seq, "frozen_usec %llu\n",
> +		   (unsigned long long) freeze_time / NSEC_PER_USEC);
> +
> +	return 0;
> +}
> +
>  #ifdef CONFIG_CGROUP_SCHED
>  /**
>   * cgroup_tryget_css - try to get a cgroup's css for the specified subsystem
> @@ -5354,6 +5375,11 @@ static struct cftype cgroup_base_files[] = {
>  		.name = "cgroup.stat",
>  		.seq_show = cgroup_stat_show,
>  	},
> +	{
> +		.name = "cgroup.stat.local",
> +		.flags = CFTYPE_NOT_ON_ROOT,
> +		.seq_show = cgroup_core_local_stat_show,
> +	},
>  	{
>  		.name = "cgroup.freeze",
>  		.flags = CFTYPE_NOT_ON_ROOT,
> @@ -5763,6 +5789,7 @@ static struct cgroup *cgroup_create(struct cgroup *parent, const char *name,
>  	 * if the parent has to be frozen, the child has too.
>  	 */
>  	cgrp->freezer.e_freeze = parent->freezer.e_freeze;
> +	seqcount_init(&cgrp->freezer.freeze_seq);
>  	if (cgrp->freezer.e_freeze) {
>  		/*
>  		 * Set the CGRP_FREEZE flag, so when a process will be
> @@ -5771,6 +5798,7 @@ static struct cgroup *cgroup_create(struct cgroup *parent, const char *name,
>  		 * consider it frozen immediately.
>  		 */
>  		set_bit(CGRP_FREEZE, &cgrp->flags);
> +		cgrp->freezer.freeze_start_nsec = ktime_get_ns();
>  		set_bit(CGRP_FROZEN, &cgrp->flags);
>  	}
>  
> diff --git a/kernel/cgroup/freezer.c b/kernel/cgroup/freezer.c
> index bf1690a167dd..6c18854bff34 100644
> --- a/kernel/cgroup/freezer.c
> +++ b/kernel/cgroup/freezer.c
> @@ -171,7 +171,7 @@ static void cgroup_freeze_task(struct task_struct *task, bool freeze)
>  /*
>   * Freeze or unfreeze all tasks in the given cgroup.
>   */
> -static void cgroup_do_freeze(struct cgroup *cgrp, bool freeze)
> +static void cgroup_do_freeze(struct cgroup *cgrp, bool freeze, u64 ts_nsec)
>  {
>  	struct css_task_iter it;
>  	struct task_struct *task;
> @@ -179,10 +179,16 @@ static void cgroup_do_freeze(struct cgroup *cgrp, bool freeze)
>  	lockdep_assert_held(&cgroup_mutex);
>  
>  	spin_lock_irq(&css_set_lock);
> -	if (freeze)
> +	write_seqcount_begin(&cgrp->freezer.freeze_seq);
> +	if (freeze) {
>  		set_bit(CGRP_FREEZE, &cgrp->flags);
> -	else
> +		cgrp->freezer.freeze_start_nsec = ts_nsec;
> +	} else {
>  		clear_bit(CGRP_FREEZE, &cgrp->flags);
> +		cgrp->freezer.frozen_nsec += (ts_nsec -
> +			cgrp->freezer.freeze_start_nsec);
> +	}
> +	write_seqcount_end(&cgrp->freezer.freeze_seq);
>  	spin_unlock_irq(&css_set_lock);
> 

Hello Tiffany,

I wanted to check if there are any specific considerations regarding how we should input the ts_nsec
value.

Would it be possible to define this directly within the cgroup_do_freeze function rather than
passing it as a parameter? This approach might simplify the implementation and potentially improve
timing accuracy when it have lots of descendants.

-- 
Best regards,
Ridong

>  	if (freeze)
> @@ -260,6 +266,7 @@ void cgroup_freeze(struct cgroup *cgrp, bool freeze)
>  	struct cgroup *parent;
>  	struct cgroup *dsct;
>  	bool applied = false;
> +	u64 ts_nsec;
>  	bool old_e;
>  
>  	lockdep_assert_held(&cgroup_mutex);
> @@ -271,6 +278,7 @@ void cgroup_freeze(struct cgroup *cgrp, bool freeze)
>  		return;
>  
>  	cgrp->freezer.freeze = freeze;
> +	ts_nsec = ktime_get_ns();
>  
>  	/*
>  	 * Propagate changes downwards the cgroup tree.
> @@ -298,7 +306,7 @@ void cgroup_freeze(struct cgroup *cgrp, bool freeze)
>  		/*
>  		 * Do change actual state: freeze or unfreeze.
>  		 */
> -		cgroup_do_freeze(dsct, freeze);
> +		cgroup_do_freeze(dsct, freeze, ts_nsec);
>  		applied = true;
>  	}
>  


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ