linux-kernel - [RFC PATCH] cgroup: Track time in cgroup v2 freezer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250603224304.3198729-3-ynaffit@google.com>
Date: Tue,  3 Jun 2025 22:43:05 +0000
From: Tiffany Yang <ynaffit@...gle.com>
To: linux-kernel@...r.kernel.org
Cc: cgroups@...r.kernel.org, kernel-team@...roid.com, 
	John Stultz <jstultz@...gle.com>, Thomas Gleixner <tglx@...utronix.de>, Stephen Boyd <sboyd@...nel.org>, 
	Anna-Maria Behnsen <anna-maria@...utronix.de>, Frederic Weisbecker <frederic@...nel.org>, 
	Tejun Heo <tj@...nel.org>, Johannes Weiner <hannes@...xchg.org>, 
	"Michal Koutný" <mkoutny@...e.com>, "Rafael J. Wysocki" <rafael@...nel.org>, Pavel Machek <pavel@...nel.org>, 
	Roman Gushchin <roman.gushchin@...ux.dev>, Chen Ridong <chenridong@...wei.com>, 
	Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, 
	Juri Lelli <juri.lelli@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>, 
	Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt <rostedt@...dmis.org>, 
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, 
	Valentin Schneider <vschneid@...hat.com>
Subject: [RFC PATCH] cgroup: Track time in cgroup v2 freezer

The cgroup v2 freezer controller allows user processes to be dynamically
added to and removed from an interruptible frozen state from
userspace. This feature is helpful for application management, as it
allows background tasks to be frozen to prevent them from being
scheduled or otherwise contending with foreground tasks for resources.
Still, applications are usually unaware of their having been placed in
the freezer cgroup, so any watchdog timers they may have set will fire
when they exit. To address this problem, I propose tracking the per-task
frozen time and exposing it to userland via procfs.

Currently, the cgroup css_set_lock is used to serialize accesses to the
new task_struct counters (frozen_time_total and frozen_time_start). If
we start to see higher contention on this lock, we may want to introduce
a separate per-task mutex or seq_lock, but the main focus in this
initial submission is establishing the right UAPI for this accounting
information.

While any comments on this RFC are appreciated, there are several areas
where feedback would be especially welcome:
   1. I know there is some hesitancy toward adding new proc files to
      the system, so I would welcome suggestions as to how this per-task
      accounting might be better exposed to userland.
   2. Unlike the cgroup v1 freezer controller, the cgroup v2 freezer
      does not use the system-wide freezer shared by the power
      management system to freeze tasks. Instead, tasks are placed into
      a cgroup v2 freezer-specific frozen state similar to jobctl
      stop. Consequently, the time being accounted for here is somewhat
      narrow and specific to cgroup v2 functionality, but there may be
      better ways to generalize it.

Since this is a first stab at discussing the potential interface, I've
not yet updated the procfs documentation for this. Once there is
consensus around the interface, I will fill that out.

Thank you for your time!
Tiffany

Signed-off-by: Tiffany Yang <ynaffit@...gle.com>
---
Cc: John Stultz <jstultz@...gle.com>
Cc: Thomas Gleixner <tglx@...utronix.de>
Cc: Stephen Boyd <sboyd@...nel.org>
Cc: Anna-Maria Behnsen <anna-maria@...utronix.de>
Cc: Frederic Weisbecker <frederic@...nel.org>
Cc: Tejun Heo <tj@...nel.org>
Cc: Johannes Weiner <hannes@...xchg.org>
Cc: Michal Koutný <mkoutny@...e.com>
Cc: "Rafael J. Wysocki" <rafael@...nel.org>
Cc: Pavel Machek <pavel@...nel.org>
Cc: Roman Gushchin <roman.gushchin@...ux.dev>
Cc: Chen Ridong <chenridong@...wei.com>
---
 fs/proc/base.c          |  2 ++
 include/linux/cgroup.h  |  2 ++
 include/linux/sched.h   |  3 +++
 kernel/cgroup/cgroup.c  |  2 ++
 kernel/cgroup/freezer.c | 20 ++++++++++++++++++++
 5 files changed, 29 insertions(+)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index c667702dc69b..38a05bb53cd1 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3377,6 +3377,7 @@ static const struct pid_entry tgid_base_stuff[] = {
 #endif
 #ifdef CONFIG_CGROUPS
 	ONE("cgroup",  S_IRUGO, proc_cgroup_show),
+	ONE("cgroup_v2_freezer_time_frozen",  0444, proc_cgroup_frztime_show),
 #endif
 #ifdef CONFIG_PROC_CPU_RESCTRL
 	ONE("cpu_resctrl_groups", S_IRUGO, proc_resctrl_show),
@@ -3724,6 +3725,7 @@ static const struct pid_entry tid_base_stuff[] = {
 #endif
 #ifdef CONFIG_CGROUPS
 	ONE("cgroup",  S_IRUGO, proc_cgroup_show),
+	ONE("cgroup_v2_freezer_time_frozen",  0444, proc_cgroup_frztime_show),
 #endif
 #ifdef CONFIG_PROC_CPU_RESCTRL
 	ONE("cpu_resctrl_groups", S_IRUGO, proc_resctrl_show),
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index b18fb5fcb38e..871831808e22 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -837,6 +837,8 @@ void cgroup_update_frozen(struct cgroup *cgrp);
 void cgroup_freeze(struct cgroup *cgrp, bool freeze);
 void cgroup_freezer_migrate_task(struct task_struct *task, struct cgroup *src,
 				 struct cgroup *dst);
+int proc_cgroup_frztime_show(struct seq_file *m, struct pid_namespace *ns,
+			     struct pid *pid, struct task_struct *tsk);
 
 static inline bool cgroup_task_frozen(struct task_struct *task)
 {
diff --git a/include/linux/sched.h b/include/linux/sched.h
index aa9c5be7a632..55d173fd070c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1321,6 +1321,9 @@ struct task_struct {
 	struct css_set __rcu		*cgroups;
 	/* cg_list protected by css_set_lock and tsk->alloc_lock: */
 	struct list_head		cg_list;
+	/* freezer stats protected by the css_set_lock: */
+	u64				frozen_time_total;
+	u64				frozen_time_start;
 #endif
 #ifdef CONFIG_X86_CPU_RESCTRL
 	u32				closid;
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index a723b7dc6e4e..05e1d2cf3654 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -6470,6 +6470,8 @@ void cgroup_fork(struct task_struct *child)
 {
 	RCU_INIT_POINTER(child->cgroups, &init_css_set);
 	INIT_LIST_HEAD(&child->cg_list);
+	child->frozen_time_total = 0;
+	child->frozen_time_start = 0;
 }
 
 /**
diff --git a/kernel/cgroup/freezer.c b/kernel/cgroup/freezer.c
index bf1690a167dd..7dd9e70a47c5 100644
--- a/kernel/cgroup/freezer.c
+++ b/kernel/cgroup/freezer.c
@@ -110,6 +110,7 @@ void cgroup_enter_frozen(void)
 
 	spin_lock_irq(&css_set_lock);
 	current->frozen = true;
+	current->frozen_time_start = ktime_get_ns();
 	cgrp = task_dfl_cgroup(current);
 	cgroup_inc_frozen_cnt(cgrp);
 	cgroup_update_frozen(cgrp);
@@ -132,10 +133,13 @@ void cgroup_leave_frozen(bool always_leave)
 	spin_lock_irq(&css_set_lock);
 	cgrp = task_dfl_cgroup(current);
 	if (always_leave || !test_bit(CGRP_FREEZE, &cgrp->flags)) {
+		u64 end_ns;
 		cgroup_dec_frozen_cnt(cgrp);
 		cgroup_update_frozen(cgrp);
 		WARN_ON_ONCE(!current->frozen);
 		current->frozen = false;
+		end_ns = ktime_get_ns();
+		current->frozen_time_total += (end_ns - current->frozen_time_start);
 	} else if (!(current->jobctl & JOBCTL_TRAP_FREEZE)) {
 		spin_lock(&current->sighand->siglock);
 		current->jobctl |= JOBCTL_TRAP_FREEZE;
@@ -254,6 +258,22 @@ void cgroup_freezer_migrate_task(struct task_struct *task,
 	cgroup_freeze_task(task, test_bit(CGRP_FREEZE, &dst->flags));
 }
 
+int proc_cgroup_frztime_show(struct seq_file *m, struct pid_namespace *ns,
+			     struct pid *pid, struct task_struct *tsk)
+{
+	u64 delta = 0;
+
+	spin_lock_irq(&css_set_lock);
+	if (tsk->frozen)
+		delta = ktime_get() - tsk->frozen_time_start;
+
+	seq_printf(m, "%llu\n",
+		   (unsigned long long)(tsk->frozen_time_total + delta));
+	spin_unlock_irq(&css_set_lock);
+
+	return 0;
+}
+
 void cgroup_freeze(struct cgroup *cgrp, bool freeze)
 {
 	struct cgroup_subsys_state *css;
-- 
2.49.0.1204.g71687c7c1d-goog