linux-kernel - Re: [RFC PATCH] cgroup: Track time in cgroup v2 freezer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ry6p5w3p4l7pnsovyapu6n2by7f4zl63c7umwut2ngdxinx6fs@yu53tunbkxdi>
Date: Mon, 30 Jun 2025 19:40:28 +0200
From: Michal Koutný <mkoutny@...e.com>
To: Tiffany Yang <ynaffit@...gle.com>
Cc: linux-kernel@...r.kernel.org, cgroups@...r.kernel.org, 
	kernel-team@...roid.com, John Stultz <jstultz@...gle.com>, 
	Thomas Gleixner <tglx@...utronix.de>, Stephen Boyd <sboyd@...nel.org>, 
	Anna-Maria Behnsen <anna-maria@...utronix.de>, Frederic Weisbecker <frederic@...nel.org>, 
	Tejun Heo <tj@...nel.org>, Johannes Weiner <hannes@...xchg.org>, 
	"Rafael J. Wysocki" <rafael@...nel.org>, Pavel Machek <pavel@...nel.org>, 
	Roman Gushchin <roman.gushchin@...ux.dev>, Chen Ridong <chenridong@...wei.com>, 
	Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, 
	Juri Lelli <juri.lelli@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>, 
	Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt <rostedt@...dmis.org>, 
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, 
	Valentin Schneider <vschneid@...hat.com>
Subject: Re: [RFC PATCH] cgroup: Track time in cgroup v2 freezer

On Fri, Jun 27, 2025 at 12:47:23AM -0700, Tiffany Yang <ynaffit@...gle.com> wrote:
> In our case, the deadline is meant to be relative to the time our task
> spends running; since we don't have a clock for that, we set our timer
> against the system time (CLOCK_MONOTONIC, in this case) as an
> approximation.

Would it be sufficient to measure that deadline against
cpu.stat:usage_usec (CPU time consumed by the cgroup)? Or do I
misunderstand your latter deadline metric?

> Adding it to /proc/<pid>/stat is an option, but because this metric
> isn't very widely used and exactly what it measures is pretty particular
> ("freezer time, but no, cgroup freezer time, but v2 and not v1"), we
> were hesitant to add it there and make this interface even more
> difficult for folks to parse.

Yeah, it'd need strong use case to add it there.

> Thank you for asking this! This is a very helpful question. My answer is
> that other causes of delay may be equally important, but this is another
> place where things get messy because of the spectrum of types of
> "delay". If we break delays into 2 categories, delays that were
> requested (sleep) and delays that were not (SIGSTOP), I can say that we
> are primarily interested in delays that were not requested.

(Note that SIGSTOP may be sent to self or within the group but) mind
that even the category "not requested" is split into two other: resource
contention and freezing management. And the latter should be under
control of the agent that sets the deadlines.

> However, there are many cases that fall somewhere in between, like the
> wakeup latency after a sleep, or that are difficult to account for,
> like blocking on a futex (requested), where the owner might be
> preempted (not requested).

Those are order(s) of magnitude different. I can't imagine that using
freezer for jobs where also wakeup latency matters.


> Ideally, we could abstract this out in a more general way to other
> delays (like SIGSTOP), but the challenge here is that there isn't a
> clear line that separates a problematic delay from an acceptable
> delay. Suggestions for a framework to approach this more generally are
> very welcome.

Well, there are multiple similar metrics: various (cgroup) PSI, (global)
steal time, cpu.stat:throttled_usage and perhaps some more.

> In the meantime, focusing on task frozen/stopped time seems like the
> most reasonable approach. Maybe that would be clear enough to make it
> palatable for proc/<pid>/stat ?

Tejun's suggestion with tracking cgroup's frozen time of whole cgroup
could complement other "debugging" stats provided by cgroups by I tend
to think that it's not good (and certainly not complete) solution to
your problem.

Regards,
Michal

Download attachment "signature.asc" of type "application/pgp-signature" (229 bytes)