[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250714050008.2167786-2-ynaffit@google.com>
Date: Sun, 13 Jul 2025 22:00:09 -0700
From: Tiffany Yang <ynaffit@...gle.com>
To: linux-kernel@...r.kernel.org
Cc: John Stultz <jstultz@...gle.com>, Thomas Gleixner <tglx@...utronix.de>,
Stephen Boyd <sboyd@...nel.org>, Anna-Maria Behnsen <anna-maria@...utronix.de>,
Frederic Weisbecker <frederic@...nel.org>, Tejun Heo <tj@...nel.org>,
Johannes Weiner <hannes@...xchg.org>, "Michal Koutný" <mkoutny@...e.com>,
"Rafael J. Wysocki" <rafael@...nel.org>, Pavel Machek <pavel@...nel.org>,
Roman Gushchin <roman.gushchin@...ux.dev>, Chen Ridong <chenridong@...wei.com>,
kernel-team@...roid.com, Jonathan Corbet <corbet@....net>, cgroups@...r.kernel.org,
linux-doc@...r.kernel.org
Subject: [RFC PATCH v2] cgroup: Track time in cgroup v2 freezer
The cgroup v2 freezer controller allows user processes to be dynamically
added to and removed from an interruptible frozen state from
userspace. This feature is helpful for application management, as
a background app can be frozen to prevent its threads from being
scheduled or otherwise contending with foreground tasks for
resources. However, the application is usually unaware that it was
frozen, which can cause issues by disrupting any internal monitoring
that it is performing.
As an example, an application may implement a watchdog thread for one of
its high priority maintenance tasks that operates by checking some state
of that task at a set interval to ensure it has made progress. The key
challenge here is that the task is only expected to make progress when
the application it belongs to has the opportunity to run, but there's no
application-relative time to set the watchdog timer against. Instead,
the next timeout is set relative to system time, using an approximation
that assumes the application will continue to be scheduled as
normal. If the task misses that approximate deadline because the
application was frozen, without any way to know that, the watchdog may
kill the healthy process.
Other sources of delay can cause similar issues, but this change focuses
on allowing frozen time to be accounted for in particular because of how
large it can grow and how unevenly it can affect applications running on
the system. To allow an application to better account for the time it
spends running, I propose tracking the time each cgroup spends freezing
and exposing it to userland via a new core interface file in
cgroupfs (cgroup.freeze.stat). I used this naming because utility
controllers like "kill" and "freeze" are exposed as cgroup v2 core
interface files, but I'm happy to change it if there's a convention
others would prefer!
Currently, the cgroup css_set_lock is used to serialize accesses to the
CGRP_FREEZE bit of cgrp->flags and the new cgroup_freezer_state counters
(freeze_time_start_ns and freeze_time_total_ns). If we start to see
higher contention on this lock, we may want to introduce a v2 freezer
state-specific lock to avoid having to take the global lock every time
a cgroup.freeze.stat file is read.
Any feedback would be much appreciated!
Thank you,
Tiffany
Signed-off-by: Tiffany Yang <ynaffit@...gle.com>
---
v2:
* Track per-cgroup freezing time instead of per-task frozen time as
suggested by Tejun Heo
Cc: John Stultz <jstultz@...gle.com>
Cc: Thomas Gleixner <tglx@...utronix.de>
Cc: Stephen Boyd <sboyd@...nel.org>
Cc: Anna-Maria Behnsen <anna-maria@...utronix.de>
Cc: Frederic Weisbecker <frederic@...nel.org>
Cc: Tejun Heo <tj@...nel.org>
Cc: Johannes Weiner <hannes@...xchg.org>
Cc: Michal Koutný <mkoutny@...e.com>
Cc: "Rafael J. Wysocki" <rafael@...nel.org>
Cc: Pavel Machek <pavel@...nel.org>
Cc: Roman Gushchin <roman.gushchin@...ux.dev>
Cc: Chen Ridong <chenridong@...wei.com>
---
Documentation/admin-guide/cgroup-v2.rst | 8 ++++++++
include/linux/cgroup-defs.h | 6 ++++++
kernel/cgroup/cgroup.c | 24 ++++++++++++++++++++++++
kernel/cgroup/freezer.c | 8 ++++++--
4 files changed, 44 insertions(+), 2 deletions(-)
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index bd98ea3175ec..9fbf3a959bdf 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1018,6 +1018,14 @@ All cgroup core files are prefixed with "cgroup."
it's possible to delete a frozen (and empty) cgroup, as well as
create new sub-cgroups.
+ cgroup.freeze.stat
+ A read-only flat-keyed file which exists in non-root cgroups.
+ The following entry is defined:
+
+ freeze_time_total_ns
+ Cumulative time that this cgroup has spent in the freezing
+ state, regardless of whether or not it reaches "frozen".
+
cgroup.kill
A write-only single value file which exists in non-root cgroups.
The only allowed value is "1".
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index e61687d5e496..86332d83fa22 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -436,6 +436,12 @@ struct cgroup_freezer_state {
* frozen, SIGSTOPped, and PTRACEd.
*/
int nr_frozen_tasks;
+
+ /* Time when the cgroup was requested to freeze */
+ u64 freeze_time_start_ns;
+
+ /* Total duration the cgroup has spent freezing */
+ u64 freeze_time_total_ns;
};
struct cgroup {
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index a723b7dc6e4e..1f54d16a8713 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -4050,6 +4050,23 @@ static ssize_t cgroup_freeze_write(struct kernfs_open_file *of,
return nbytes;
}
+static int cgroup_freeze_stat_show(struct seq_file *seq, void *v)
+{
+ struct cgroup *cgrp = seq_css(seq)->cgroup;
+ u64 freeze_time = 0;
+
+ spin_lock_irq(&css_set_lock);
+ if (test_bit(CGRP_FREEZE, &cgrp->flags))
+ freeze_time = ktime_get_ns() - cgrp->freezer.freeze_time_start_ns;
+
+ freeze_time += cgrp->freezer.freeze_time_total_ns;
+ spin_unlock_irq(&css_set_lock);
+
+ seq_printf(seq, "freeze_time_total_ns %llu\n", freeze_time);
+
+ return 0;
+}
+
static void __cgroup_kill(struct cgroup *cgrp)
{
struct css_task_iter it;
@@ -5355,6 +5372,11 @@ static struct cftype cgroup_base_files[] = {
.seq_show = cgroup_freeze_show,
.write = cgroup_freeze_write,
},
+ {
+ .name = "cgroup.freeze.stat",
+ .flags = CFTYPE_NOT_ON_ROOT,
+ .seq_show = cgroup_freeze_stat_show,
+ },
{
.name = "cgroup.kill",
.flags = CFTYPE_NOT_ON_ROOT,
@@ -5758,6 +5780,7 @@ static struct cgroup *cgroup_create(struct cgroup *parent, const char *name,
* if the parent has to be frozen, the child has too.
*/
cgrp->freezer.e_freeze = parent->freezer.e_freeze;
+ cgrp->freezer.freeze_time_total_ns = 0;
if (cgrp->freezer.e_freeze) {
/*
* Set the CGRP_FREEZE flag, so when a process will be
@@ -5766,6 +5789,7 @@ static struct cgroup *cgroup_create(struct cgroup *parent, const char *name,
* consider it frozen immediately.
*/
set_bit(CGRP_FREEZE, &cgrp->flags);
+ cgrp->freezer.freeze_time_start_ns = ktime_get_ns();
set_bit(CGRP_FROZEN, &cgrp->flags);
}
diff --git a/kernel/cgroup/freezer.c b/kernel/cgroup/freezer.c
index bf1690a167dd..6f3fab252140 100644
--- a/kernel/cgroup/freezer.c
+++ b/kernel/cgroup/freezer.c
@@ -179,10 +179,14 @@ static void cgroup_do_freeze(struct cgroup *cgrp, bool freeze)
lockdep_assert_held(&cgroup_mutex);
spin_lock_irq(&css_set_lock);
- if (freeze)
+ if (freeze) {
set_bit(CGRP_FREEZE, &cgrp->flags);
- else
+ cgrp->freezer.freeze_time_start_ns = ktime_get_ns();
+ } else {
clear_bit(CGRP_FREEZE, &cgrp->flags);
+ cgrp->freezer.freeze_time_total_ns += (ktime_get_ns() -
+ cgrp->freezer.freeze_time_start_ns);
+ }
spin_unlock_irq(&css_set_lock);
if (freeze)
--
2.50.0.727.gbf7dc18ff4-goog
Powered by blists - more mailing lists