linux-kernel - Re: [RFC PATCH] cgroup: Track time in cgroup v2 freezer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aEDM_s7y8xMKJHph@slm.duckdns.org>
Date: Wed, 4 Jun 2025 12:47:26 -1000
From: Tejun Heo <tj@...nel.org>
To: Tiffany Yang <ynaffit@...gle.com>
Cc: linux-kernel@...r.kernel.org, cgroups@...r.kernel.org,
	kernel-team@...roid.com, John Stultz <jstultz@...gle.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Stephen Boyd <sboyd@...nel.org>,
	Anna-Maria Behnsen <anna-maria@...utronix.de>,
	Frederic Weisbecker <frederic@...nel.org>,
	Johannes Weiner <hannes@...xchg.org>,
	Michal Koutný <mkoutny@...e.com>,
	"Rafael J. Wysocki" <rafael@...nel.org>,
	Pavel Machek <pavel@...nel.org>,
	Roman Gushchin <roman.gushchin@...ux.dev>,
	Chen Ridong <chenridong@...wei.com>, Ingo Molnar <mingo@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Juri Lelli <juri.lelli@...hat.com>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
	Valentin Schneider <vschneid@...hat.com>
Subject: Re: [RFC PATCH] cgroup: Track time in cgroup v2 freezer

Hello, Tiffany.

On Wed, Jun 04, 2025 at 07:39:29PM +0000, Tiffany Yang wrote:
...
> Thanks for taking a look! In this case, I would argue that the value we
> are accounting for (time that a task has not been able to run because it
> is in the cgroup v2 frozen state) is task-specific and distinct from the
> time that the cgroup it belongs to has been frozen.
> 
> A cgroup is not considered frozen until all of its members are frozen,
> and if one task then leaves the frozen state, the entire cgroup is
> considered no longer frozen, even if its other members stay in the
> frozen state. Similarly, even if a task is migrated from one frozen
> cgroup (A) to another frozen cgroup (B), the time cgroup B has been
> frozen would not be representative of that task even though it is a
> member.
> 
> There is also latency between when each task in a cgroup is marked as
> to-be-frozen/unfrozen and when it actually enters the frozen state, so
> each descendant task has a different frozen time. For watchdogs that
> elapse on a per-task basis, a per-cgroup time-in-frozen value would
> underreport the actual time each task spent unable to run. Tasks that
> miss a deadline might incorrectly be considered misbehaving when the
> time they spent suspended was not correctly accounted for.
> 
> Please let me know if that answers your question or if there's something
> I'm missing. I agree that it would be cleaner/preferable to keep this
> accounting under a cgroup-specific umbrella, so I hope there is some way
> to get around these issues, but it doesn't look like cgroup fs has a
> good way to keep task-specific stats at the moment.

I'm not sure freezing/frozen distinction is that meaningful. If each cgroup
tracks total durations for both states, most threads should be able to rely
on freezing duration delta, right? There shouldn't be significant time gap
between freezing starting and most threads being frozen although the cgroup
may not reach full frozen state due to e.g. NFS and what not.

As long as threads are not migrated across cgroups, it should be able to do
something like:

1. Read /proc/self/cgroup to determine the current cgroup.
2. Read and remember freezing duration $CGRP/cgroup.stat.
3. Do time taking operation.
4. Read $CGRP/cgrp.stat and calculate delta and deduct that from time taken.

Would that work?

Thanks.

-- 
tejun