[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250603224304.3198729-3-ynaffit@google.com>
Date: Tue, 3 Jun 2025 22:43:05 +0000
From: Tiffany Yang <ynaffit@...gle.com>
To: linux-kernel@...r.kernel.org
Cc: cgroups@...r.kernel.org, kernel-team@...roid.com,
John Stultz <jstultz@...gle.com>, Thomas Gleixner <tglx@...utronix.de>, Stephen Boyd <sboyd@...nel.org>,
Anna-Maria Behnsen <anna-maria@...utronix.de>, Frederic Weisbecker <frederic@...nel.org>,
Tejun Heo <tj@...nel.org>, Johannes Weiner <hannes@...xchg.org>,
"Michal Koutný" <mkoutny@...e.com>, "Rafael J. Wysocki" <rafael@...nel.org>, Pavel Machek <pavel@...nel.org>,
Roman Gushchin <roman.gushchin@...ux.dev>, Chen Ridong <chenridong@...wei.com>,
Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Valentin Schneider <vschneid@...hat.com>
Subject: [RFC PATCH] cgroup: Track time in cgroup v2 freezer
The cgroup v2 freezer controller allows user processes to be dynamically
added to and removed from an interruptible frozen state from
userspace. This feature is helpful for application management, as it
allows background tasks to be frozen to prevent them from being
scheduled or otherwise contending with foreground tasks for resources.
Still, applications are usually unaware of their having been placed in
the freezer cgroup, so any watchdog timers they may have set will fire
when they exit. To address this problem, I propose tracking the per-task
frozen time and exposing it to userland via procfs.
Currently, the cgroup css_set_lock is used to serialize accesses to the
new task_struct counters (frozen_time_total and frozen_time_start). If
we start to see higher contention on this lock, we may want to introduce
a separate per-task mutex or seq_lock, but the main focus in this
initial submission is establishing the right UAPI for this accounting
information.
While any comments on this RFC are appreciated, there are several areas
where feedback would be especially welcome:
1. I know there is some hesitancy toward adding new proc files to
the system, so I would welcome suggestions as to how this per-task
accounting might be better exposed to userland.
2. Unlike the cgroup v1 freezer controller, the cgroup v2 freezer
does not use the system-wide freezer shared by the power
management system to freeze tasks. Instead, tasks are placed into
a cgroup v2 freezer-specific frozen state similar to jobctl
stop. Consequently, the time being accounted for here is somewhat
narrow and specific to cgroup v2 functionality, but there may be
better ways to generalize it.
Since this is a first stab at discussing the potential interface, I've
not yet updated the procfs documentation for this. Once there is
consensus around the interface, I will fill that out.
Thank you for your time!
Tiffany
Signed-off-by: Tiffany Yang <ynaffit@...gle.com>
---
Cc: John Stultz <jstultz@...gle.com>
Cc: Thomas Gleixner <tglx@...utronix.de>
Cc: Stephen Boyd <sboyd@...nel.org>
Cc: Anna-Maria Behnsen <anna-maria@...utronix.de>
Cc: Frederic Weisbecker <frederic@...nel.org>
Cc: Tejun Heo <tj@...nel.org>
Cc: Johannes Weiner <hannes@...xchg.org>
Cc: Michal Koutný <mkoutny@...e.com>
Cc: "Rafael J. Wysocki" <rafael@...nel.org>
Cc: Pavel Machek <pavel@...nel.org>
Cc: Roman Gushchin <roman.gushchin@...ux.dev>
Cc: Chen Ridong <chenridong@...wei.com>
---
fs/proc/base.c | 2 ++
include/linux/cgroup.h | 2 ++
include/linux/sched.h | 3 +++
kernel/cgroup/cgroup.c | 2 ++
kernel/cgroup/freezer.c | 20 ++++++++++++++++++++
5 files changed, 29 insertions(+)
diff --git a/fs/proc/base.c b/fs/proc/base.c
index c667702dc69b..38a05bb53cd1 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3377,6 +3377,7 @@ static const struct pid_entry tgid_base_stuff[] = {
#endif
#ifdef CONFIG_CGROUPS
ONE("cgroup", S_IRUGO, proc_cgroup_show),
+ ONE("cgroup_v2_freezer_time_frozen", 0444, proc_cgroup_frztime_show),
#endif
#ifdef CONFIG_PROC_CPU_RESCTRL
ONE("cpu_resctrl_groups", S_IRUGO, proc_resctrl_show),
@@ -3724,6 +3725,7 @@ static const struct pid_entry tid_base_stuff[] = {
#endif
#ifdef CONFIG_CGROUPS
ONE("cgroup", S_IRUGO, proc_cgroup_show),
+ ONE("cgroup_v2_freezer_time_frozen", 0444, proc_cgroup_frztime_show),
#endif
#ifdef CONFIG_PROC_CPU_RESCTRL
ONE("cpu_resctrl_groups", S_IRUGO, proc_resctrl_show),
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index b18fb5fcb38e..871831808e22 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -837,6 +837,8 @@ void cgroup_update_frozen(struct cgroup *cgrp);
void cgroup_freeze(struct cgroup *cgrp, bool freeze);
void cgroup_freezer_migrate_task(struct task_struct *task, struct cgroup *src,
struct cgroup *dst);
+int proc_cgroup_frztime_show(struct seq_file *m, struct pid_namespace *ns,
+ struct pid *pid, struct task_struct *tsk);
static inline bool cgroup_task_frozen(struct task_struct *task)
{
diff --git a/include/linux/sched.h b/include/linux/sched.h
index aa9c5be7a632..55d173fd070c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1321,6 +1321,9 @@ struct task_struct {
struct css_set __rcu *cgroups;
/* cg_list protected by css_set_lock and tsk->alloc_lock: */
struct list_head cg_list;
+ /* freezer stats protected by the css_set_lock: */
+ u64 frozen_time_total;
+ u64 frozen_time_start;
#endif
#ifdef CONFIG_X86_CPU_RESCTRL
u32 closid;
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index a723b7dc6e4e..05e1d2cf3654 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -6470,6 +6470,8 @@ void cgroup_fork(struct task_struct *child)
{
RCU_INIT_POINTER(child->cgroups, &init_css_set);
INIT_LIST_HEAD(&child->cg_list);
+ child->frozen_time_total = 0;
+ child->frozen_time_start = 0;
}
/**
diff --git a/kernel/cgroup/freezer.c b/kernel/cgroup/freezer.c
index bf1690a167dd..7dd9e70a47c5 100644
--- a/kernel/cgroup/freezer.c
+++ b/kernel/cgroup/freezer.c
@@ -110,6 +110,7 @@ void cgroup_enter_frozen(void)
spin_lock_irq(&css_set_lock);
current->frozen = true;
+ current->frozen_time_start = ktime_get_ns();
cgrp = task_dfl_cgroup(current);
cgroup_inc_frozen_cnt(cgrp);
cgroup_update_frozen(cgrp);
@@ -132,10 +133,13 @@ void cgroup_leave_frozen(bool always_leave)
spin_lock_irq(&css_set_lock);
cgrp = task_dfl_cgroup(current);
if (always_leave || !test_bit(CGRP_FREEZE, &cgrp->flags)) {
+ u64 end_ns;
cgroup_dec_frozen_cnt(cgrp);
cgroup_update_frozen(cgrp);
WARN_ON_ONCE(!current->frozen);
current->frozen = false;
+ end_ns = ktime_get_ns();
+ current->frozen_time_total += (end_ns - current->frozen_time_start);
} else if (!(current->jobctl & JOBCTL_TRAP_FREEZE)) {
spin_lock(¤t->sighand->siglock);
current->jobctl |= JOBCTL_TRAP_FREEZE;
@@ -254,6 +258,22 @@ void cgroup_freezer_migrate_task(struct task_struct *task,
cgroup_freeze_task(task, test_bit(CGRP_FREEZE, &dst->flags));
}
+int proc_cgroup_frztime_show(struct seq_file *m, struct pid_namespace *ns,
+ struct pid *pid, struct task_struct *tsk)
+{
+ u64 delta = 0;
+
+ spin_lock_irq(&css_set_lock);
+ if (tsk->frozen)
+ delta = ktime_get() - tsk->frozen_time_start;
+
+ seq_printf(m, "%llu\n",
+ (unsigned long long)(tsk->frozen_time_total + delta));
+ spin_unlock_irq(&css_set_lock);
+
+ return 0;
+}
+
void cgroup_freeze(struct cgroup *cgrp, bool freeze)
{
struct cgroup_subsys_state *css;
--
2.49.0.1204.g71687c7c1d-goog
Powered by blists - more mailing lists