linux-kernel - Re: [PATCH v5 4/4] sched: Fix cgroup irq time for CONFIG_IRQ_TIME

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALOAHbCSo3qcbPwGQBxc0dY=aTHLd6pw-Lpva0tS+gkU+x0K8Q@mail.gmail.com>
Date: Wed, 4 Dec 2024 10:17:55 +0800
From: Yafang Shao <laoar.shao@...il.com>
To: Michal Koutný <mkoutny@...e.com>
Cc: mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com, 
	vincent.guittot@...aro.org, dietmar.eggemann@....com, rostedt@...dmis.org, 
	bsegall@...gle.com, mgorman@...e.de, vschneid@...hat.com, hannes@...xchg.org, 
	surenb@...gle.com, cgroups@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 4/4] sched: Fix cgroup irq time for CONFIG_IRQ_TIME_ACCOUNTING

On Tue, Dec 3, 2024 at 6:01 PM Michal Koutný <mkoutny@...e.com> wrote:
>
> On Fri, Nov 08, 2024 at 09:29:04PM GMT, Yafang Shao <laoar.shao@...il.com> wrote:
> > The system metric in cpuacct.stat is crucial in indicating whether a
> > container is under heavy system pressure, including IRQ/softirq activity.
> > Hence, IRQ/softirq time should be included in the cpuacct system usage,
> > which also applies to cgroup2’s rstat.
>
> (snipped from cover letter thread)
>
> On Mon, Nov 18, 2024 at 08:12:03PM GMT, Yafang Shao <laoar.shao@...il.com> wrote:
> > The key issue here is determining how to reliably get the IRQ. I don't
> > believe there is a dependable way to achieve this.
> >
> > For example, consider a server with 16 CPUs. My cgroup contains 4
> > threads that can freely migrate across CPUs, while other tasks are
> > also running on the system simultaneously. In this scenario, how can
> > we accurately determine the IRQ to subtract?
>
> I understand there's some IRQ noise which is a property of CPU (noise is
> a function of cpu).
>
> Then there're cgroup workloads, on a single CPU impact per-cgroup
> depends how much that given cgroup runs on the CPU (it's more exposed).
> Whole cgroup's impact is sum of these (i.e. it's kind of scalar product
> between that IRQ noise per-cpu and cgroup's CPU consuption per-cpu).
>
> (In your usage, there's some correlation of IRQ noise with CPU
> consumption).
>
> > That is precisely what the user wants. If my tasks are frequently
> > interrupted by IRQs, it indicates that my service may be experiencing
> > poor quality. In response, I would likely reduce the traffic sent to
> > it. If the issue persists and IRQ interruptions remain high, I would
> > then consider migrating the service to other servers.
>
> If I look at
>         52b1364ba0b10 ("sched/psi: Add PSI_IRQ to track IRQ/SOFTIRQ pressure")
> (where it's clearer than after
>         ddae0ca2a8fe1 ("sched: Move psi_account_irqtime() out of update_rq_clock_task() hotpath")
> )
>
> the IRQ pressure accounting takes into account the task (its cgroup) vs
> IRQ time (~multiplication) and then its summed a) over time
> (psi_account_irqtime()), b) over CPUs (collect_percpu_times()) so it's
> the scalar product (squinting) I mentioned above.
>
> Therefore I think irq.pressure provides exactly the information that's
> relevant for your scheduling decisions and the info cannot be fit into
> cpuacct.stat.
>
> Or what is irq.pressure missing or distorting for your scenario?

irq.pressure is a metric that operates independently of CPU
utilization. It can be used to monitor whether latency spikes in a
cgroup are caused by high IRQ pressure—this is exactly how we utilize
it in our production environment.

However, this metric alone doesn’t provide clear guidance on whether
to add or reduce CPUs for a workload. To address this, we’ve attempted
to combine irq.pressure with CPU utilization into a unified metric,
but achieving this has proven to be challenging.

--
Regards
Yafang