lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140516101505.GO13658@twins.programming.kicks-ass.net>
Date:	Fri, 16 May 2014 12:15:05 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Lai Jiangshan <laijs@...fujitsu.com>
Cc:	jjherne@...ux.vnet.ibm.com, Sasha Levin <sasha.levin@...cle.com>,
	Tejun Heo <tj@...nel.org>, LKML <linux-kernel@...r.kernel.org>,
	Dave Jones <davej@...hat.com>, Ingo Molnar <mingo@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Steven Rostedt <rostedt@...dmis.org>
Subject: Re: workqueue: WARN at at kernel/workqueue.c:2176

On Fri, May 16, 2014 at 11:35:30AM +0200, Peter Zijlstra wrote:
> On Fri, May 16, 2014 at 11:50:42AM +0800, Lai Jiangshan wrote:
> > After debugging, I found the hotlug-in cpu is atctive but !online in this case.
> > the problem was introduced by 5fbd036b.
> > Some code assumes that any cpu in cpu_active_mask is also online, but 5fbd036b breaks
> > this assumption, so the corresponding code with this assumption should be changed too.
> 
> Good find, and yes it does that.
> 
> > The following patch is just a workaround. After it is applied, the above WARNING
> > is gone, but I can't hit the wq problem that you found.
> 
> Seeing how the entirety of hotplug is basically duct tape and twigs, the
> below isn't that bad.


I made that, are you okay with that?

---
Subject: sched: Fix hotplug vs set_cpus_allowed_ptr()
From: Lai Jiangshan <laijs@...fujitsu.com>
Date: Fri, 16 May 2014 11:50:42 +0800

Lai found that:

  WARNING: CPU: 1 PID: 13 at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x2d/0x4b()
  ...
  migration_cpu_stop+0x1d/0x22

was caused by set_cpus_allowed_ptr() assuming that cpu_active_mask is
always a sub-set of cpu_online_mask.

This isn't true since 5fbd036b552f ("sched: Cleanup cpu_active
madness").

So set active and online at the same time to avoid this particular
problem.

Fixes: 5fbd036b552f ("sched: Cleanup cpu_active madness")
Signed-off-by: Lai Jiangshan <laijs@...fujitsu.com>
Signed-off-by: Peter Zijlstra <peterz@...radead.org>
Link: http://lkml.kernel.org/r/53758B12.8060609@cn.fujitsu.com
---
 kernel/cpu.c        |    6 ++++--
 kernel/sched/core.c |    1 -
 2 files changed, 4 insertions(+), 3 deletions(-)

--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -726,10 +726,12 @@ void set_cpu_present(unsigned int cpu, b
 
 void set_cpu_online(unsigned int cpu, bool online)
 {
-	if (online)
+	if (online) {
 		cpumask_set_cpu(cpu, to_cpumask(cpu_online_bits));
-	else
+		cpumask_set_cpu(cpu, to_cpumask(cpu_active_bits));
+	} else {
 		cpumask_clear_cpu(cpu, to_cpumask(cpu_online_bits));
+	}
 }
 
 void set_cpu_active(unsigned int cpu, bool active)
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5126,7 +5126,6 @@ static int sched_cpu_active(struct notif
 				      unsigned long action, void *hcpu)
 {
 	switch (action & ~CPU_TASKS_FROZEN) {
-	case CPU_STARTING:
 	case CPU_DOWN_FAILED:
 		set_cpu_active((long)hcpu, true);
 		return NOTIFY_OK;

Content of type "application/pgp-signature" skipped

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ