[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALOAHbDRA5Ak-N4DyHfNhVhVMT-8CuVVagkgYNv2nBB6eNqFsQ@mail.gmail.com>
Date: Wed, 15 Jan 2025 10:58:25 +0800
From: Yafang Shao <laoar.shao@...il.com>
To: Michal Koutný <mkoutny@...e.com>
Cc: Peter Zijlstra <peterz@...radead.org>, mingo@...hat.com, hannes@...xchg.org,
juri.lelli@...hat.com, vincent.guittot@...aro.org, dietmar.eggemann@....com,
rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de, vschneid@...hat.com,
surenb@...gle.com, linux-kernel@...r.kernel.org, cgroups@...r.kernel.org,
lkp@...el.com
Subject: Re: [PATCH v8 4/4] sched: Fix cgroup irq time for CONFIG_IRQ_TIME_ACCOUNTING
On Tue, Jan 14, 2025 at 11:48 PM Michal Koutný <mkoutny@...e.com> wrote:
>
> On Thu, Jan 09, 2025 at 09:46:23PM +0800, Yafang Shao <laoar.shao@...il.com> wrote:
> > No, in the case of !CONFIG_IRQ_TIME_ACCOUNTING, CPU usage will
> > increase as we add more workloads. In other words, this is a
> > user-visible behavior change, and we should aim to avoid it.
>
> I wouldn't be excited about that -- differently configured kernel is
> supposed to behave differently.
>
> > Document it as follows?
>
> That makes sense to me, with explanation of "where" the time
> (dis)appears.
>
> >
> > "Enabling CONFIG_IRQ_TIME_ACCOUNTING will exclude IRQ usage from the
> > CPU usage of your tasks. In other words, your task's CPU usage will
> > only reflect user time and system time."
> reflect proper user ...
> and IRQ usage is only attributed on the global level visible e.g. in
> /proc/stat or irq.pressure (possibly on cgroup level).
I completely agree with you that it's essential to clearly document
this behavior change. Thanks for your suggestion.
>
> > If we document it clearly this way, I believe no one will try to enable it ;-)
>
> I understand that users who want to have the insight between real system
> time and IRQ time would enable it.
>
>
> > It worked well before the introduction of CONFIG_IRQ_TIME_ACCOUNTING.
> > Why not just maintain the previous behavior, especially since it's not
> > difficult to do so?
>
> Then why do you need CONFIG_IRQ_TIME_ACCOUNTING enabled? Bundling it
> together with (not so) random tasks used to work for you.
Our motivation for enabling irq.pressure was to monitor and address
issues caused by IRQ activity. On our production servers, we have
already enabled {cpu,memory,io}.pressure, which have proven to be very
helpful. We believed that irq.pressure could provide similar benefits.
However, we encountered an unexpected behavior change introduced by
enabling irq.pressure, which has been unacceptable to our users. As a
result, we have reverted this configuration in our production
environment. If the issues we observed cannot be resolved, we will not
enable irq.pressure again.
>
> > We’re unsure how to use this metric to guide us, and I don't think
> > there will be clear guidance on how irq.pressure relates to CPU
> > utilization. :(
>
> (If irq.pressure is not useful in this case, then when is it useful? I
> obviously need to brush up on this.)
It’s not that “irq.pressure is not useful in this case,” but rather
that irq.pressure introduces a behavior change that we find
unacceptable.
Our motivation for enabling irq.pressure stems from the lack of a
dedicated IRQ utilization metric in cpuacct.stat or cpu.stat
(cgroup2). For example:
$ cat cpuacct.stat
user 21763829
system 1052269 <<<< irq is included in system
$ cat cpu.stat
usage_usec 3878342
user_usec 1589312
system_usec 2289029 <<<< irq is included here
...
To address this, I propose introducing a separate metric, such as irq
or irq_usec (cgroup2), to explicitly reflect the IRQ utilization of a
cgroup. This approach would eliminate the need to enable irq.pressure
while providing the desired insights into IRQ usage.
WDYT ?
--
Regards
Yafang
Powered by blists - more mailing lists