[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <m3og4sktkzf6j62terh4xcbfiw45ziymhmt7x7iuyzcogl67cy@ufqvgzttd2n7>
Date: Fri, 21 Feb 2025 16:36:02 +0100
From: Michal Koutný <mkoutny@...e.com>
To: Johannes Weiner <hannes@...xchg.org>
Cc: Tejun Heo <tj@...nel.org>, Abel Wu <wuyun.abel@...edance.com>,
Jonathan Corbet <corbet@....net>, Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>, Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>, Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Valentin Schneider <vschneid@...hat.com>, Thomas Gleixner <tglx@...utronix.de>,
Yury Norov <yury.norov@...il.com>, Andrew Morton <akpm@...ux-foundation.org>,
Bitao Hu <yaoma@...ux.alibaba.com>, Chen Ridong <chenridong@...wei.com>,
"open list:CONTROL GROUP (CGROUP)" <cgroups@...r.kernel.org>, "open list:DOCUMENTATION" <linux-doc@...r.kernel.org>,
open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 3/3] cgroup/rstat: Add run_delay accounting for cgroups
On Mon, Feb 10, 2025 at 01:25:45PM -0500, Johannes Weiner <hannes@...xchg.org> wrote:
> Yes, a more detailed description of the usecase would be helpful.
>
> I'm not exactly sure how the sum of wait times in a cgroup would be
> used to gauge load without taking available concurrency into account.
> One second of aggregate wait time means something very different if
> you have 200 cpus compared to if you have 2.
>
> This is precisely what psi tries to capture. "Some" does provide group
> loading information in a sense, but it's a
>
> ratio over available concurrency,
This comes as a surprise to me (I originally assumed it's only
time(some)/time(interval)).
But I confirm that after actually looking at the avg* values it is over
nr_tasks.
If the value is already normalized by nr_tasks, I'm seeing less of a
benefit of Σ run_delay.
> and currently capped at 100%. I.e. if you have N cpus, 100% some is
> "at least N threads waiting at all times." There is a gradient below
> that, but not above.
Is this a typo? (s/some/full/ or s/at least N/at least 1/)
(Actually, if I correct my thinking with the nr_tasks normalization,
then your statement makes sense. OTOH, what is the difference betwen
'full' and 'some' at 100%?)
Also I played a bit.
cat >/root/cpu_n.sh <<EOD
#!/bin/bash
worker() {
echo "$BASHPID: starting on $1"
taskset -c -p $i $BASHPID
while true ; do
true
done
}
for i in $(seq ${1:-1}) ; do
worker $i &
pids+=($!)
done
echo pids: ${pids[*]}
wait
EOD
systemd-run -u test.service /root/cpu_n.sh 2
# test.service/cpu.pressure:some is ~0
systemd-run -u pressure.service /root/cpu_n.sh 1
# test.service/cpu.pressure:some settles at ~25%, cpu1 is free, cpu2 half
# test.service/cpu.pressure:full settles at ~25% too(?!), I'd expect 0
^^^^^^^^^^^^
(kernel v6.13)
# pressure.service/cpu.pressure:some settles at ~50%, makes sense
# pressure.service/cpu.pressure:full settles at ~50%, makes sense
Thanks,
Michal
Powered by blists - more mailing lists