[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250424022638.GB437160@bytedance>
Date: Thu, 24 Apr 2025 10:26:38 +0800
From: Aaron Lu <ziqianlu@...edance.com>
To: Florian Bezdeka <florian.bezdeka@...mens.com>
Cc: Valentin Schneider <vschneid@...hat.com>,
Ben Segall <bsegall@...gle.com>,
K Prateek Nayak <kprateek.nayak@....com>,
Peter Zijlstra <peterz@...radead.org>,
Josh Don <joshdon@...gle.com>, Ingo Molnar <mingo@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Xi Wang <xii@...gle.com>, linux-kernel@...r.kernel.org,
Juri Lelli <juri.lelli@...hat.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Mel Gorman <mgorman@...e.de>,
Chengming Zhou <chengming.zhou@...ux.dev>,
Chuyi Zhou <zhouchuyi@...edance.com>,
Jan Kiszka <jan.kiszka@...mens.com>
Subject: Re: [RFC PATCH v2 7/7] sched/fair: alternative way of accounting
throttle time
On Wed, Apr 23, 2025 at 02:15:55PM +0200, Florian Bezdeka wrote:
> On Wed, 2025-04-23 at 19:26 +0800, Aaron Lu wrote:
> > On Tue, Apr 22, 2025 at 05:03:19PM +0200, Florian Bezdeka wrote:
> > ... ...
> >
> > > Right, I should have mentioned that crucial detail. Sorry.
> > >
> > > I ported your series to 6.14.2 because we did/do not trust anything
> > > newer yet for testing. The problematic workload was not available in
> > > our lab at that time, so we had to be very carefully about deployed
> > > kernel versions.
> > >
> > > I'm attaching the backported patches now, so you can compare / review
> > > if you like. Spoiler: The only differences are line numbers ;-)
> >
> > I didn't notice any problem regarding backport after a quick look.
> >
> > May I know what kind of workload triggered this warning? I haven't been
> > able to trigger it, I'll have to stare harder at the code.
>
> There are a couple of containers running. Nothing special as far as I
> can tell. Network, IO, at least one container heavily using the epoll
> interface.
Thanks for the info, I'll run with PREEMPT_RT enabled and see if I can
find anything.
>
> The system is still operating fine though...
>
So that means only the h_nr_throttle accounting is incorrect. The throttle
time accounting will be affected but looks like the functionality is OK.
> Once again: PREEMPT_RT enabled, so maybe handling an IRQ over the
> accounting code could happen? Looking at the warning again it looks
> like unthrottle_cfs_rq() is called from IRQ context. Is that expected?
Yes it is.
The period timer handler will distribute runtime to individual
cfs_rqs of this task_group and those cfs_rqs are per-cpu. The timer
handler did this asynchronously, i.e. it sends IPI to corresponding CPU
to let them deal with unthrottling their cfs_rq by their own, to reduce
the time this timer handler runs. See commit 8ad075c2eb1f("sched: Async
unthrottling for cfs bandwidth").
I think this creates an interesting result in PREEMPT_RT: the CPU that
runs the hrtimer handler unthrottles its cfs_rq in ktimerd context while
all others unthrottle their cfs_rqs in hardirq context. I don't see any
problem with this, it just seems inconsistent.
Thanks,
Aaron
Powered by blists - more mailing lists