[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y4CHEOEbBqAm70oQ@hirez.programming.kicks-ass.net>
Date: Fri, 25 Nov 2022 10:12:48 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Josh Don <joshdon@...gle.com>
Cc: Chengming Zhou <zhouchengming@...edance.com>,
Ingo Molnar <mingo@...hat.com>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Valentin Schneider <vschneid@...hat.com>,
linux-kernel@...r.kernel.org, Tejun Heo <tj@...nel.org>,
Michal Koutný <mkoutny@...e.com>,
Christian Brauner <brauner@...nel.org>,
Zefan Li <lizefan.x@...edance.com>,
Thomas Gleixner <tglx@...utronix.de>,
Frederic Weisbecker <fweisbec@...il.com>,
anna-maria@...utronix.de
Subject: Re: [PATCH v3] sched: async unthrottling for cfs bandwidth
On Fri, Nov 25, 2022 at 09:59:23AM +0100, Peter Zijlstra wrote:
> On Fri, Nov 25, 2022 at 09:57:09AM +0100, Peter Zijlstra wrote:
> > On Tue, Nov 22, 2022 at 11:35:48AM +0100, Peter Zijlstra wrote:
> > > On Mon, Nov 21, 2022 at 11:37:14AM -0800, Josh Don wrote:
> > > > Yep, this tradeoff feels "best", but there are some edge cases where
> > > > this could potentially disrupt fairness. For example, if we have
> > > > non-trivial W, a lot of cpus to iterate through for dispatching remote
> > > > unthrottle, and quota is small. Doesn't help that the timer is pinned
> > > > so that this will continually hit the same cpu.
> > >
> > > We could -- if we wanted to -- manually rotate the timer around the
> > > relevant CPUs. Doing that sanely would require a bit of hrtimer surgery
> > > though I'm afraid.
> >
> > Here; something like so should enable us to cycle the bandwidth timer.
> > Just need to figure out a way to find another CPU or something.
>
> Some more preparation...
And then I think something like so.. That migrates the timer to the CPU
of the first throttled entry -- possibly not the best heuristic, but its
the simplest.
NOTE: none of this has seen a compiler up close.
---
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5595,13 +5595,21 @@ static bool distribute_cfs_runtime(struc
*/
static int do_sched_cfs_period_timer(struct cfs_bandwidth *cfs_b, int overrun, unsigned long flags)
{
- int throttled;
+ struct cfs_rq *first_cfs_rq;
+ int throttled = 0;
+ int cpu;
/* no need to continue the timer with no bandwidth constraint */
if (cfs_b->quota == RUNTIME_INF)
goto out_deactivate;
- throttled = !list_empty(&cfs_b->throttled_cfs_rq);
+ first_cfs_rq = list_first_entry_or_null(&cfs_b->throttled_cfs_rq,
+ struct cfs_rq, throttled_list);
+ if (first_cfs_rq) {
+ throttled = 1;
+ cpu = cpu_of(rq_of(first_cfs_rq));
+ }
+
cfs_b->nr_periods += overrun;
/* Refill extra burst quota even if cfs_b->idle */
@@ -5641,7 +5649,7 @@ static int do_sched_cfs_period_timer(str
*/
cfs_b->idle = 0;
- return HRTIMER_RESTART;
+ return HRTIMER_RESTART_MIGRATE + cpu;
out_deactivate:
return HRTIMER_NORESTART;
Powered by blists - more mailing lists