linux-kernel - Re: Re: [PATCH v2 3/3] cgroup/rstat: Add run

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b4ae80bf-36b4-4f31-96ad-6876372c91a2@bytedance.com>
Date: Wed, 12 Feb 2025 23:14:53 +0800
From: Abel Wu <wuyun.abel@...edance.com>
To: Tejun Heo <tj@...nel.org>, Michal Koutný
 <mkoutny@...e.com>
Cc: Johannes Weiner <hannes@...xchg.org>, Jonathan Corbet <corbet@....net>,
 Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
 Juri Lelli <juri.lelli@...hat.com>,
 Vincent Guittot <vincent.guittot@...aro.org>,
 Dietmar Eggemann <dietmar.eggemann@....com>,
 Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
 Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
 Thomas Gleixner <tglx@...utronix.de>, Yury Norov <yury.norov@...il.com>,
 Andrew Morton <akpm@...ux-foundation.org>, Bitao Hu
 <yaoma@...ux.alibaba.com>, Chen Ridong <chenridong@...wei.com>,
 "open list:CONTROL GROUP (CGROUP)" <cgroups@...r.kernel.org>,
 "open list:DOCUMENTATION" <linux-doc@...r.kernel.org>,
 open list <linux-kernel@...r.kernel.org>
Subject: Re: Re: [PATCH v2 3/3] cgroup/rstat: Add run_delay accounting for
 cgroups

On 2/11/25 12:20 AM, Tejun Heo Wrote:
> On Mon, Feb 10, 2025 at 04:38:56PM +0100, Michal Koutný wrote:
> ...
>> The challenge is with nr (assuming they're all runnable during Δt), that
>> would need to be sampled from /sys/kernel/debug/sched/debug. But then
>> you can get whatever load for individual cfs_rqs from there. Hm, does it
>> even make sense to add up run_delays from different CPUs?
> 
> The difficulty in aggregating across CPUs is why some and full pressures are
> defined the way they are. Ideally, we'd want full distribution of stall
> states across CPUs but both aggregation and presentation become challenging,
> so some/full provide the two extremes. Sum of all cpu_delay adds more
> incomplete signal on top. I don't know how useful it'd be. At meta, we
> depend on PSI a lot when investigating resource problems and we've never
> felt the need for the sum time, so that's one data point with the caveat
> that usually our focus is on mem and io pressures where some and full
> pressure metrics usually seem to provide sufficient information.

It's interesting, as we also find that PSI is of great useful in memory
and io and never thought of aggregating them across CPUs. With my limited
knowledge, I guess it's because they have shared global bottleneck. F.e.
a shortage of memory will put all the tasks of that memcg in same situation
no matter which cpu they are running on. And the io issues are generally
caused by legacy backends which have poor performance, that is lower speed
or less hwqueues, so still contend with each other outside the scope of
cpus. While the scheduling is different, some cpus can be much contended
than others due to affinity constrains or something else, since different
cpus have separated bandwidth.

> 
> As the picture provided by some and full metrics is incomplete, I can
> imagine adding the sum being useful. That said, it'd help if Able can
> provide more concrete examples on it being useful. Another thing to consider
> is whether we should add this across resources monitored by PSI - cpu, mem
> and io.

Please check my last reply to see our usecase, and it would be appreciated
if you can shed some light on it.

Thanks & Best Regards,
	Abel