[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YHU/a9HvGLYpOLKZ@hirez.programming.kicks-ass.net>
Date: Tue, 13 Apr 2021 08:51:23 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Valentin Schneider <valentin.schneider@....com>
Cc: tglx@...utronix.de, mingo@...nel.org, bigeasy@...utronix.de,
swood@...hat.com, juri.lelli@...hat.com,
vincent.guittot@...aro.org, dietmar.eggemann@....com,
rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
bristot@...hat.com, vincent.donnefort@....com, qais.yousef@....com,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 3/3] sched: Use cpu_dying() to fix balance_push vs
hotplug-rollback
On Mon, Apr 12, 2021 at 06:22:42PM +0100, Valentin Schneider wrote:
> On 12/04/21 14:03, Peter Zijlstra wrote:
> > On Thu, Mar 11, 2021 at 03:13:04PM +0000, Valentin Schneider wrote:
> >> Peter Zijlstra <peterz@...radead.org> writes:
> >> > @@ -7910,6 +7908,14 @@ int sched_cpu_deactivate(unsigned int cp
> >> > }
> >> > rq_unlock_irqrestore(rq, &rf);
> >> >
> >> > + /*
> >> > + * From this point forward, this CPU will refuse to run any task that
> >> > + * is not: migrate_disable() or KTHREAD_IS_PER_CPU, and will actively
> >> > + * push those tasks away until this gets cleared, see
> >> > + * sched_cpu_dying().
> >> > + */
> >> > + balance_push_set(cpu, true);
> >> > +
> >>
> >> AIUI with cpu_dying_mask being flipped before even entering
> >> sched_cpu_deactivate(), we don't need this to be before the
> >> synchronize_rcu() anymore; is there more than that to why you're punting it
> >> back this side of it?
> >
> > I think it does does need to be like this, we need to clearly separate
> > the active=true and balance_push_set(). If we were to somehow observe
> > both balance_push_set() and active==false, we'd be in trouble.
> >
>
> I'm afraid I don't follow; we're replacing a read of rq->balance_push with
> cpu_dying(), and those are still written on the same side of the
> synchronize_rcu(). What am I missing?
Yeah, I'm not sure anymnore either; I tried to work out why I'd done
that but upon closer examination everything fell flat.
Let me try again today :-)
> Oooh, I can't read, only the boot CPU gets its callback uninstalled in
> sched_init()! So secondaries keep push_callback installed up until
> sched_cpu_activate(), but as you said it's not effective unless a rollback
> happens.
>
> Now, doesn't that mean we should *not* uninstall the callback in
> sched_cpu_dying()? AFAIK it's possible for the initial secondary CPU
> boot to go fine, but the next offline+online cycle fails while going up -
> that would need to rollback with push_callback installed.
Quite; I removed that shortly after sending this; when I tried to write
a comment and found it.
Powered by blists - more mailing lists