[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171120163040.GA25993@zipoli.concurrent-rt.com>
Date: Mon, 20 Nov 2017 11:30:40 -0500
From: joe.korty@...current-rt.com
To: Steven Rostedt <rostedt@...dmis.org>
Cc: Thomas Gleixner <tglx@...utronix.de>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] 4.4.86-rt99: fix sync breakage between nr_cpus_allowed
and cpus_allowed
Hi Steve,
A quick perusal of 4.11.12-rt16 shows that it has an
entirely new version of migrate_disable which to me appears
correct.
In that new implementation, migrate_enable() recalculates
p->nr_cpus_allowed when it switches the task back to
using p->cpus_mask. This brings the two back into sync
if anything had happened to get them out of sync while
migration was disabled (as would happen on an affinity
change during that disable period).
4.9.47-rt37 has the old implementation and it appears to
have same bug as 4.4-rt though I have yet to test 4.9-rt.
The fix in these older versions could take one of two
forms: either we recalculate p->nr_cpus_allowed when
migrate_enable goes back to using p->cpus_allowed,
as the 4.11-rt version does, or the one place where we
allow p->nr_cpus_allowed to diverge from p->cpus_allowed
be fixed. The patch I submitted earlier takes this second
approach.
Regards,
Joe
On Fri, Nov 17, 2017 at 05:48:51PM -0500, Steven Rostedt wrote:
> On Wed, 15 Nov 2017 14:25:29 -0500
> joe.korty@...current-rt.com wrote:
>
> > 4.4.86-rt99's patch
> >
> > 0037-Intrduce-migrate_disable-cpu_light.patch
> >
> > introduces a place where a task's cpus_allowed mask is
> > updated without a corresponding update to nr_cpus_allowed.
> >
> > This path is executed when task affinity is changed while
> > migrate_disabled() is true. As there is no code present
> > to set nr_cpus_allowed when the migrate_disable state is
> > dropped, the scheduler at that point on may make incorrect
> > scheduling decisions for this task.
> >
> > My testing consists of temporarily adding a
> >
> > if (tsk_nr_cpus_allowed(p) == cpumask_weight(tsk_cpus_allowed(p))
> > printk_ratelimited(...)
>
> Have you tested v4.9-rt or 4.13-rt if it has the same bug? If it is a
> bug in 4.13-rt then it needs to go there first, and then backported to
> the stable releases (which I'm actually working on now).
>
> -- Steve
>
> >
> > stmt to schedule() and running a simple affinity rotation
> > program I wrote, one that rotates the threads of stress(1).
> > While rotating, I got the expected kernel error messages.
> > With this patch applied the messages disappeared.
> >
> > Signed-off-by: Joe Korty <joe.korty@...current-rt.com>
> >
> > Index: b/kernel/sched/core.c
> > ===================================================================
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -1220,6 +1220,7 @@ void do_set_cpus_allowed(struct task_str
> > lockdep_assert_held(&p->pi_lock);
> >
> > if (__migrate_disabled(p)) {
> > + p->nr_cpus_allowed = cpumask_weight(new_mask);
> > cpumask_copy(&p->cpus_allowed, new_mask);
> > return;
> > }
Powered by blists - more mailing lists