[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2e3mby62lswkw454sq4b4wnjmcr6etoug5bazafutb6dbbpozl@juhpci6ebev2>
Date: Tue, 27 May 2025 17:33:41 +0200
From: Michal Koutný <mkoutny@...e.com>
To: Yafang Shao <laoar.shao@...il.com>
Cc: mingo@...hat.com, peterz@...radead.org, hannes@...xchg.org,
juri.lelli@...hat.com, vincent.guittot@...aro.org, dietmar.eggemann@....com,
rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de, vschneid@...hat.com,
surenb@...gle.com, linux-kernel@...r.kernel.org, cgroups@...r.kernel.org,
lkp@...el.com
Subject: Re: [PATCH v9 1/2] sched: Fix cgroup irq time for
CONFIG_IRQ_TIME_ACCOUNTING
Hello.
On Sun, May 11, 2025 at 11:07:59AM +0800, Yafang Shao <laoar.shao@...il.com> wrote:
> The CPU usage of the cgroup is relatively low at around 55%, but this usage
> doesn't increase, even with more netperf tasks. The reason is that CPU0 is
> at 100% utilization, as confirmed by mpstat:
>
> 02:56:22 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
> 02:56:23 PM 0 0.99 0.00 55.45 0.00 0.99 42.57 0.00 0.00 0.00 0.00
>
> 02:56:23 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
> 02:56:24 PM 0 2.00 0.00 55.00 0.00 0.00 43.00 0.00 0.00 0.00 0.00
>
> It is clear that the %soft is excluded in the cgroup of the interrupted
> task. This behavior is unexpected. We should include IRQ time in the
> cgroup to reflect the pressure the group is under.
I think this would go against intention of CONFIG_IRQ_TIME_ACCOUNTING
(someony more familiar may chime in).
> After a thorough analysis, I discovered that this change in behavior is due
> to commit 305e6835e055 ("sched: Do not account irq time to current task"),
> which altered whether IRQ time should be charged to the interrupted task.
> While I agree that a task should not be penalized by random interrupts, the
> task itself cannot progress while interrupted. Therefore, the interrupted
> time should be reported to the user.
>
> The system metric in cpuacct.stat is crucial in indicating whether a
> container is under heavy system pressure, including IRQ/softirq activity.
> Hence, IRQ/softirq time should be included in the cpuacct system usage,
> which also applies to cgroup2’s rstat.
So I guess, it'd be better to add a separate entry in cpu.stat with
irq_usec (instead of bundling it into system_usec in spite of
CONFIG_IRQ_TIME_ACCOUNTING).
I admit, I'd be happier if irq.pressure values could be used for
that. Maybe not the PSI ratio itself but irq.pressure:total should be
that amount. WDYT?
Michal
Download attachment "signature.asc" of type "application/pgp-signature" (229 bytes)
Powered by blists - more mailing lists