[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALOAHbBQXSuUmz8C2CKA2o-Menup8uz3qOX34JsZCCG68GhaWg@mail.gmail.com>
Date: Wed, 28 May 2025 10:10:51 +0800
From: Yafang Shao <laoar.shao@...il.com>
To: Michal Koutný <mkoutny@...e.com>
Cc: mingo@...hat.com, peterz@...radead.org, hannes@...xchg.org,
juri.lelli@...hat.com, vincent.guittot@...aro.org, dietmar.eggemann@....com,
rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de, vschneid@...hat.com,
surenb@...gle.com, linux-kernel@...r.kernel.org, cgroups@...r.kernel.org,
lkp@...el.com
Subject: Re: [PATCH v9 1/2] sched: Fix cgroup irq time for CONFIG_IRQ_TIME_ACCOUNTING
On Tue, May 27, 2025 at 11:33 PM Michal Koutný <mkoutny@...e.com> wrote:
>
> Hello.
>
> On Sun, May 11, 2025 at 11:07:59AM +0800, Yafang Shao <laoar.shao@...il.com> wrote:
> > The CPU usage of the cgroup is relatively low at around 55%, but this usage
> > doesn't increase, even with more netperf tasks. The reason is that CPU0 is
> > at 100% utilization, as confirmed by mpstat:
> >
> > 02:56:22 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
> > 02:56:23 PM 0 0.99 0.00 55.45 0.00 0.99 42.57 0.00 0.00 0.00 0.00
> >
> > 02:56:23 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
> > 02:56:24 PM 0 2.00 0.00 55.00 0.00 0.00 43.00 0.00 0.00 0.00 0.00
> >
> > It is clear that the %soft is excluded in the cgroup of the interrupted
> > task. This behavior is unexpected. We should include IRQ time in the
> > cgroup to reflect the pressure the group is under.
>
> I think this would go against intention of CONFIG_IRQ_TIME_ACCOUNTING
> (someony more familiar may chime in).
Please refer to the discussion with Ingo :
https://lore.kernel.org/all/aBsGXCKX8-2_Cn9x@gmail.com/
>
> > After a thorough analysis, I discovered that this change in behavior is due
> > to commit 305e6835e055 ("sched: Do not account irq time to current task"),
> > which altered whether IRQ time should be charged to the interrupted task.
> > While I agree that a task should not be penalized by random interrupts, the
> > task itself cannot progress while interrupted. Therefore, the interrupted
> > time should be reported to the user.
> >
> > The system metric in cpuacct.stat is crucial in indicating whether a
> > container is under heavy system pressure, including IRQ/softirq activity.
> > Hence, IRQ/softirq time should be included in the cpuacct system usage,
> > which also applies to cgroup2’s rstat.
>
> So I guess, it'd be better to add a separate entry in cpu.stat with
> irq_usec (instead of bundling it into system_usec in spite of
> CONFIG_IRQ_TIME_ACCOUNTING).
>
> I admit, I'd be happier if irq.pressure values could be used for
> that. Maybe not the PSI ratio itself but irq.pressure:total should be
> that amount. WDYT?
Thank you for your suggestion. Both methods can effectively retrieve
the container’s IRQ usage. However, I prefer adding a new entry
irq_usec to cpu.stat since it aligns better with CPU utilization
metrics.
--
Regards
Yafang
Powered by blists - more mailing lists