[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <dmibxkog4sdbuddotjslmyv6zgyptgbq5voujhfnitdag2645m@bl4jphfz3xzg>
Date: Fri, 15 Nov 2024 14:41:45 +0100
From: Michal Koutný <mkoutny@...e.com>
To: Yafang Shao <laoar.shao@...il.com>
Cc: mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
vincent.guittot@...aro.org, dietmar.eggemann@....com, rostedt@...dmis.org,
bsegall@...gle.com, mgorman@...e.de, vschneid@...hat.com, hannes@...xchg.org,
surenb@...gle.com, cgroups@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 0/4] sched: Fix missing irq time when
CONFIG_IRQ_TIME_ACCOUNTING is enabled
Hello Yafang.
On Fri, Nov 08, 2024 at 09:29:00PM GMT, Yafang Shao <laoar.shao@...il.com> wrote:
> After enabling CONFIG_IRQ_TIME_ACCOUNTING to track IRQ pressure in our
> container environment, we encountered several user-visible behavioral
> changes:
>
> - Interrupted IRQ/softirq time is excluded in the cpuacct cgroup
>
> This breaks userspace applications that rely on CPU usage data from
> cgroups to monitor CPU pressure. This patchset resolves the issue by
> ensuring that IRQ/softirq time is included in the cgroup of the
> interrupted tasks.
>
> - getrusage(2) does not include time interrupted by IRQ/softirq
>
> Some services use getrusage(2) to check if workloads are experiencing CPU
> pressure. Since IRQ/softirq time is no longer included in task runtime,
> getrusage(2) can no longer reflect the CPU pressure caused by heavy
> interrupts.
I understand that IRQ/softirq time is difficult to attribute to an
"accountable" entity and it's technically simplest to attribute it
everyone/noone, i.e. to root cgroup (or through a global stat w/out
cgroups).
> This patchset addresses the first issue, which is relatively
> straightforward. Once this solution is accepted, I will address the second
> issue in a follow-up patchset.
Is the first issue about cpuacct data or irq.pressure?
It sounds kind of both and I noticed the docs for irq.pressure is
lacking in Documentation/accounting/psi.rst. When you're touching this,
could you please add a paragraph or sentence explaining what does this
value represent?
(Also, there is same change both for cpuacct and
cgroup_base_stat_cputime_show(), right?)
> ----------------
> | Load Balancer|
> ----------------
> / | | \
> / | | \
> Server1 Server2 Server3 ... ServerN
>
> Although the load balancer's algorithm is complex, it follows some core
> principles:
>
> - When server CPU utilization increases, it adds more servers and deploys
> additional instances to meet SLA requirements.
> - When server CPU utilization decreases, it scales down by decommissioning
> servers and reducing the number of instances to save on costs.
A server here references to a whole node (whole kernel) or to a cgroup
(i.e. more servers on top of one kernel)?
> The load balancer is malfunctioning due to the exclusion of IRQ time from
> CPU utilization calculations.
Could this be fixed by subtracting (global) IRQ time from (presumed
total) system capacity that the balancer uses for its decisions? (i.e.
without exact per-cgroup breakdown of IRQ time)
Thanks,
Michal
Download attachment "signature.asc" of type "application/pgp-signature" (229 bytes)
Powered by blists - more mailing lists