[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210602152629.GF31179@willie-the-truck>
Date: Wed, 2 Jun 2021 16:26:29 +0100
From: Will Deacon <will@...nel.org>
To: Valentin Schneider <valentin.schneider@....com>
Cc: linux-kernel@...r.kernel.org,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...nel.org>,
Morten Rasmussen <morten.rasmussen@....com>,
Qais Yousef <qais.yousef@....com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Quentin Perret <qperret@...gle.com>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
kernel-team@...roid.com
Subject: Re: [PATCH 2/2] sched: Plug race between SCA, hotplug and
migration_cpu_stop()
On Tue, Jun 01, 2021 at 05:59:56PM +0100, Valentin Schneider wrote:
> On 26/05/21 21:57, Valentin Schneider wrote:
> > + dest_cpu = arg->dest_cpu;
> > + if (task_on_rq_queued(p)) {
> > + /*
> > + * A hotplug operation could have happened between
> > + * set_cpus_allowed_ptr() and here, making dest_cpu no
> > + * longer allowed.
> > + */
> > + if (!is_cpu_allowed(p, dest_cpu))
> > + dest_cpu = select_fallback_rq(cpu_of(rq), p);
> > + /*
> > + * dest_cpu can be victim of hotplug between is_cpu_allowed()
> > + * and here. However, per the synchronize_rcu() in
> > + * sched_cpu_deactivate(), it can't have gone lower than
> > + * CPUHP_AP_ACTIVE, so it's safe to punt it over and let
> > + * balance_push() route it elsewhere.
> > + */
> > + update_rq_clock(rq);
> > + rq = move_queued_task(rq, &rf, p, dest_cpu);
>
> So, while digesting this I started having doubts vs pcpu kthreads since
> they're allowed on online CPUs. The bogus scenario here would be picking a
> !active && online CPU, and see it go !online before the move_queued_task().
>
> Now, to transition from online -> !online, we have to go through
> take_cpu_down() which is issued via a stop_machine() call. This means the
> transition can't happen until all online CPUs are running the stopper task
> and reach MULTI_STOP_RUN.
>
> migration_cpu_stop() being already a stopper callback should thus make it
> "atomic" vs takedown_cpu(), meaning the above should be fine.
I'd be more inclined to agree with your reasoning if migration_cpu_stop()
couldn't itself call stop_one_cpu_nowait() to queue more work for the
stopper thread. What guarantees that takedown_cpu() can't queue its stopper
work in the middle of that?
Will
Powered by blists - more mailing lists