linux-kernel - Re: [PATCH v2 3/3] cgroup/rstat: Add run

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <qt3qdbvmrqtbceeogo32bw2b7v5otc3q6gfh7vgsk4vrydcgix@33hepjadeyjb>
Date: Mon, 10 Feb 2025 16:38:56 +0100
From: Michal Koutný <mkoutny@...e.com>
To: Abel Wu <wuyun.abel@...edance.com>
Cc: Tejun Heo <tj@...nel.org>, Johannes Weiner <hannes@...xchg.org>, 
	Jonathan Corbet <corbet@....net>, Ingo Molnar <mingo@...hat.com>, 
	Peter Zijlstra <peterz@...radead.org>, Juri Lelli <juri.lelli@...hat.com>, 
	Vincent Guittot <vincent.guittot@...aro.org>, Dietmar Eggemann <dietmar.eggemann@....com>, 
	Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, 
	Valentin Schneider <vschneid@...hat.com>, Thomas Gleixner <tglx@...utronix.de>, 
	Yury Norov <yury.norov@...il.com>, Andrew Morton <akpm@...ux-foundation.org>, 
	Bitao Hu <yaoma@...ux.alibaba.com>, Chen Ridong <chenridong@...wei.com>, 
	"open list:CONTROL GROUP (CGROUP)" <cgroups@...r.kernel.org>, "open list:DOCUMENTATION" <linux-doc@...r.kernel.org>, 
	open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 3/3] cgroup/rstat: Add run_delay accounting for cgroups

Hello Abel (sorry for my delay).

On Wed, Jan 29, 2025 at 12:48:09PM +0800, Abel Wu <wuyun.abel@...edance.com> wrote:
> PSI tracks stall times for each cpu, and
> 
> 	tSOME[cpu] = time(nr_delayed_tasks[cpu] != 0)
> 
> which turns nr_delayed_tasks[cpu] into boolean value, hence loses
> insight into how severely this task group is stalled on this cpu.

Thanks for example. So the lost information is kind of a group load.
What meaning it has when there is no group throttling?

Honestly, I can't reason neither about PSI.some nor Σ run_delay wrt
feedback for resource control. What it is slightly bugging me is
introduction of another stats field before first one was explored :-)

But if there's information loss with PSI -- could cpu.pressure:some be
removed in favor of Σ run_delay? (The former could be calculated from
latter if you're right :-p)

(I didn't like the before/after shuffling with enum cpu_usage_stat
NR_STATS but I saw v4 where you tackled that.)

Michal

More context form previous message, the difference is between a) and c),
or better equal lanes:

a')
   t1 |----|
   t2 |xx--|
   t3 |----|

c)
   t1 |----|
   t2 |xx--|
   t3 |xx--|

      <-Δt->

run_delay can be calculated indepently of cpu.pressure:some
because there is still difference between a') and c) in terms of total
cpu usage.

	Δrun_delay = nr * Δt - Δusage

The challenge is with nr (assuming they're all runnable during Δt), that
would need to be sampled from /sys/kernel/debug/sched/debug. But then
you can get whatever load for individual cfs_rqs from there. Hm, does it
even make sense to add up run_delays from different CPUs?