[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210526160317.GB19691@willie-the-truck>
Date: Wed, 26 May 2021 17:03:18 +0100
From: Will Deacon <will@...nel.org>
To: Valentin Schneider <valentin.schneider@....com>
Cc: linux-arm-kernel@...ts.infradead.org, linux-arch@...r.kernel.org,
linux-kernel@...r.kernel.org,
Catalin Marinas <catalin.marinas@....com>,
Marc Zyngier <maz@...nel.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Peter Zijlstra <peterz@...radead.org>,
Morten Rasmussen <morten.rasmussen@....com>,
Qais Yousef <qais.yousef@....com>,
Suren Baghdasaryan <surenb@...gle.com>,
Quentin Perret <qperret@...gle.com>, Tejun Heo <tj@...nel.org>,
Johannes Weiner <hannes@...xchg.org>,
Ingo Molnar <mingo@...hat.com>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
"Rafael J. Wysocki" <rjw@...ysocki.net>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
kernel-team@...roid.com
Subject: Re: [PATCH v7 01/22] sched: Favour predetermined active CPU as
migration destination
On Wed, May 26, 2021 at 12:14:20PM +0100, Valentin Schneider wrote:
> On 25/05/21 16:14, Will Deacon wrote:
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 5226cc26a095..1702a60d178d 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -1869,6 +1869,7 @@ static struct rq *move_queued_task(struct rq *rq, struct rq_flags *rf,
> > struct migration_arg {
> > struct task_struct *task;
> > int dest_cpu;
> > + const struct cpumask *dest_mask;
> > struct set_affinity_pending *pending;
> > };
> >
> > @@ -1917,6 +1918,7 @@ static int migration_cpu_stop(void *data)
> > struct set_affinity_pending *pending = arg->pending;
> > struct task_struct *p = arg->task;
> > int dest_cpu = arg->dest_cpu;
> > + const struct cpumask *dest_mask = arg->dest_mask;
> > struct rq *rq = this_rq();
> > bool complete = false;
> > struct rq_flags rf;
> > @@ -1956,12 +1958,8 @@ static int migration_cpu_stop(void *data)
> > complete = true;
> > }
> >
> > - if (dest_cpu < 0) {
> > - if (cpumask_test_cpu(task_cpu(p), &p->cpus_mask))
> > - goto out;
> > -
> > - dest_cpu = cpumask_any_distribute(&p->cpus_mask);
> > - }
> > + if (dest_mask && (cpumask_test_cpu(task_cpu(p), dest_mask)))
> > + goto out;
> >
>
> IIRC the reason we deferred the pick to migration_cpu_stop() was because of
> those insane races involving multiple SCA calls the likes of:
>
> p->cpus_mask = [0, 1]; p on CPU0
>
> CPUx CPUy CPU0
>
> SCA(p, [2])
> __do_set_cpus_allowed();
> queue migration_cpu_stop()
> SCA(p, [3])
> __do_set_cpus_allowed();
> migration_cpu_stop()
>
> The stopper needs to use the latest cpumask set by the second SCA despite
> having an arg->pending set up by the first SCA. Doesn't this break here?
Yes, well spotted. I was so caught up with the hotplug race that I didn't
even consider a straightforward SCA race. Hurumph.
> I'm not sure I've paged back in all of the subtleties laying in ambush
> here, but what about the below?
I can't break it, but I'm also not very familiar with this code. Please can
you post it as a proper patch so that I drop this from my series?
Thanks,
Will
Powered by blists - more mailing lists