linux-kernel - Re: [RFC PATCH v2 7/7] sched/fair: alternative way of accounting throttle time

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250424022638.GB437160@bytedance>
Date: Thu, 24 Apr 2025 10:26:38 +0800
From: Aaron Lu <ziqianlu@...edance.com>
To: Florian Bezdeka <florian.bezdeka@...mens.com>
Cc: Valentin Schneider <vschneid@...hat.com>,
	Ben Segall <bsegall@...gle.com>,
	K Prateek Nayak <kprateek.nayak@....com>,
	Peter Zijlstra <peterz@...radead.org>,
	Josh Don <joshdon@...gle.com>, Ingo Molnar <mingo@...hat.com>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Xi Wang <xii@...gle.com>, linux-kernel@...r.kernel.org,
	Juri Lelli <juri.lelli@...hat.com>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Steven Rostedt <rostedt@...dmis.org>, Mel Gorman <mgorman@...e.de>,
	Chengming Zhou <chengming.zhou@...ux.dev>,
	Chuyi Zhou <zhouchuyi@...edance.com>,
	Jan Kiszka <jan.kiszka@...mens.com>
Subject: Re: [RFC PATCH v2 7/7] sched/fair: alternative way of accounting
 throttle time

On Wed, Apr 23, 2025 at 02:15:55PM +0200, Florian Bezdeka wrote:
> On Wed, 2025-04-23 at 19:26 +0800, Aaron Lu wrote:
> > On Tue, Apr 22, 2025 at 05:03:19PM +0200, Florian Bezdeka wrote:
> > ... ...
> > 
> > > Right, I should have mentioned that crucial detail. Sorry.
> > > 
> > > I ported your series to 6.14.2 because we did/do not trust anything
> > > newer yet for testing. The problematic workload was not available in
> > > our lab at that time, so we had to be very carefully about deployed
> > > kernel versions.
> > > 
> > > I'm attaching the backported patches now, so you can compare / review
> > > if you like. Spoiler: The only differences are line numbers ;-)
> > 
> > I didn't notice any problem regarding backport after a quick look.
> > 
> > May I know what kind of workload triggered this warning? I haven't been
> > able to trigger it, I'll have to stare harder at the code.
> 
> There are a couple of containers running. Nothing special as far as I
> can tell. Network, IO, at least one container heavily using the epoll
> interface.

Thanks for the info, I'll run with PREEMPT_RT enabled and see if I can
find anything.

> 
> The system is still operating fine though...
>

So that means only the h_nr_throttle accounting is incorrect. The throttle
time accounting will be affected but looks like the functionality is OK.

> Once again: PREEMPT_RT enabled, so maybe handling an IRQ over the
> accounting code could happen? Looking at the warning again it looks
> like unthrottle_cfs_rq() is called from IRQ context. Is that expected?

Yes it is.

The period timer handler will distribute runtime to individual
cfs_rqs of this task_group and those cfs_rqs are per-cpu. The timer
handler did this asynchronously, i.e. it sends IPI to corresponding CPU
to let them deal with unthrottling their cfs_rq by their own, to reduce
the time this timer handler runs. See commit 8ad075c2eb1f("sched: Async
unthrottling for cfs bandwidth").

I think this creates an interesting result in PREEMPT_RT: the CPU that
runs the hrtimer handler unthrottles its cfs_rq in ktimerd context while
all others unthrottle their cfs_rqs in hardirq context. I don't see any
problem with this, it just seems inconsistent.

Thanks,
Aaron