lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171120163040.GA25993@zipoli.concurrent-rt.com>
Date:   Mon, 20 Nov 2017 11:30:40 -0500
From:   joe.korty@...current-rt.com
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <a.p.zijlstra@...llo.nl>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] 4.4.86-rt99: fix sync breakage between nr_cpus_allowed
 and cpus_allowed

Hi Steve,
A quick perusal of 4.11.12-rt16 shows that it has an
entirely new version of migrate_disable which to me appears
correct.

In that new implementation, migrate_enable() recalculates
p->nr_cpus_allowed when it switches the task back to
using p->cpus_mask.  This brings the two back into sync
if anything had happened to get them out of sync while
migration was disabled (as would happen on an affinity
change during that disable period).

4.9.47-rt37 has the old implementation and it appears to
have same bug as 4.4-rt though I have yet to test 4.9-rt.

The fix in  these older versions could take one of two
forms: either we recalculate p->nr_cpus_allowed when
migrate_enable goes back to using p->cpus_allowed,
as the 4.11-rt version does, or the one place where we
allow p->nr_cpus_allowed to diverge from p->cpus_allowed
be fixed.  The patch I submitted earlier takes this second
approach.

Regards,
Joe



On Fri, Nov 17, 2017 at 05:48:51PM -0500, Steven Rostedt wrote:
> On Wed, 15 Nov 2017 14:25:29 -0500
> joe.korty@...current-rt.com wrote:
> 
> > 4.4.86-rt99's patch
> > 
> >   0037-Intrduce-migrate_disable-cpu_light.patch
> > 
> > introduces a place where a task's cpus_allowed mask is
> > updated without a corresponding update to nr_cpus_allowed.
> > 
> > This path is executed when task affinity is changed while
> > migrate_disabled() is true.  As there is no code present
> > to set nr_cpus_allowed when the migrate_disable state is
> > dropped, the scheduler at that point on may make incorrect
> > scheduling decisions for this task.
> > 
> > My testing consists of temporarily adding a
> > 
> >  if (tsk_nr_cpus_allowed(p) == cpumask_weight(tsk_cpus_allowed(p))
> >  	printk_ratelimited(...)
> 
> Have you tested v4.9-rt or 4.13-rt if it has the same bug? If it is a
> bug in 4.13-rt then it needs to go there first, and then backported to
> the stable releases (which I'm actually working on now).
> 
> -- Steve
> 
> > 
> > stmt to schedule() and running a simple affinity rotation
> > program I wrote, one that rotates the threads of stress(1).
> > While rotating, I got the expected kernel error messages.
> > With this patch applied the messages disappeared.
> > 
> > Signed-off-by: Joe Korty <joe.korty@...current-rt.com>
> > 
> > Index: b/kernel/sched/core.c
> > ===================================================================
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -1220,6 +1220,7 @@ void do_set_cpus_allowed(struct task_str
> >  	lockdep_assert_held(&p->pi_lock);
> >  
> >  	if (__migrate_disabled(p)) {
> > +		p->nr_cpus_allowed = cpumask_weight(new_mask);
> >  		cpumask_copy(&p->cpus_allowed, new_mask);
> >  		return;
> >  	}

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ