lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 29 May 2014 12:23:37 -0400
From:	"Jason J. Herne" <jjherne@...ux.vnet.ibm.com>
To:	Peter Zijlstra <peterz@...radead.org>
CC:	Lai Jiangshan <laijs@...fujitsu.com>,
	Sasha Levin <sasha.levin@...cle.com>,
	Tejun Heo <tj@...nel.org>, LKML <linux-kernel@...r.kernel.org>,
	Dave Jones <davej@...hat.com>, Ingo Molnar <mingo@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Steven Rostedt <rostedt@...dmis.org>
Subject: Re: workqueue: WARN at at kernel/workqueue.c:2176

On 05/27/2014 10:26 AM, Peter Zijlstra wrote:
> On Tue, May 27, 2014 at 10:18:31AM -0400, Jason J. Herne wrote:
>> On 05/16/2014 12:29 PM, Peter Zijlstra wrote:
>>> On Sat, May 17, 2014 at 12:18:06AM +0800, Lai Jiangshan wrote:
>>>> so the scheduler/set_cpus_allowed_ptr()/cpu_active_mask should be the first
>>>> place to fix.
>>>
>>> I'm not arguing about that, not to mention that this is userspace
>>> exposed and nobody protects that.
>>>
>>> But I was expecting kernel stuff that calls it on hotplug to be
>>> serialized thusly, but apparently not so.
>>>
>>
>> Was a final patch posted for this issue? The discussion made it sound like
>> there were still a few things to figure out before we could resolve this
>> bug. I can recreate this as needed and I'm happy to test any patches.
>
>
> https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=sched/urgent&id=6acbfb96976fc3350e30d964acb1dbbdf876d55e
>
> which should make its way to Linus soonish I suppose.
>

I applied the patch on top of c7208164e66f63e3ec1759b98087849286410741 
and I am still hitting the problem.
Should I have applied to a different branch/commit to pick up any other 
needed changes?

Patch applied:

diff --git a/kernel/cpu.c b/kernel/cpu.c
index a9e710e..247979a 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -726,10 +726,12 @@ void set_cpu_present(unsigned int cpu, bool present)
void set_cpu_online(unsigned int cpu, bool online)
{
- if (online)
+ if (online) {
cpumask_set_cpu(cpu, to_cpumask(cpu_online_bits));
- else
+ cpumask_set_cpu(cpu, to_cpumask(cpu_active_bits));
+ } else {
cpumask_clear_cpu(cpu, to_cpumask(cpu_online_bits));
+ }
}
void set_cpu_active(unsigned int cpu, bool active)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 44e00ab..86f3890 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5076,7 +5076,6 @@ static int sched_cpu_active(struct notifier_block 
*nfb,
unsigned long action, void *hcpu)
{
switch (action & ~CPU_TASKS_FROZEN) {
- case CPU_STARTING:
case CPU_DOWN_FAILED:
set_cpu_active((long)hcpu, true);
return NOTIFY_OK;

Here is the output from the recreation using this patch:

[ 3634.146233] ------------[ cut here ]------------
[ 3634.146238] WARNING: at kernel/workqueue.c:2176
[ 3634.146239] Modules linked in: ipt_MASQUERADE iptable_nat nf_nat_ipv4 
nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack 
xt_CHECKSUM iptable_mangle bridge stp llc ip6table_filter ip6_tables 
ebtable_nat ebtables iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi qeth_l2 tape_3590 tape tape_class vhost_net tun 
vhost macvtap macvlan eadm_sch qeth ccwgroup zfcp scsi_transport_fc 
scsi_tgt qdio dasd_eckd_mod dasd_mod dm_multipath [last unloaded: kvm]
[ 3634.146260] CPU: 6 PID: 28009 Comm: kworker/7:0 Not tainted 3.15.0-rc7 #1
[ 3634.146263] Workqueue: \xffffff80           (null)
[ 3634.146264] task: 000000025def32e0 ti: 000000026dca0000 task.ti: 
000000026dca0000
[ 3634.146266] Krnl PSW : 0404c00180000000 000000000015ad1a 
(process_one_work+0x2e6/0x4c0)
[ 3634.146272]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 
PM:0 EA:3
Krnl GPRS: 0000000000000000 0000000000bc649a 00000002764b0980 
0000000000b94f40
[ 3634.146275]            0000000000b94f40 0000000000000000 
0000000000000000 0000000000bc6496
[ 3634.146277]            0000000000000000 000000008b65b600 
000000008b657000 000000008b657018
[ 3634.146278]            00000002764b0980 0000000000b94f40 
000000026dca3dd0 000000026dca3d70
[ 3634.146287] Krnl Code: 000000000015ad0e: 95001000		cli	0(%r1),0
            000000000015ad12: a774fece		brc	7,15aaae
           #000000000015ad16: a7f40001		brc	15,15ad18
           >000000000015ad1a: 92011000		mvi	0(%r1),1
            000000000015ad1e: a7f4fec8		brc	15,15aaae
            000000000015ad22: e31003180004	lg	%r1,792
            000000000015ad28: 58301024		l	%r3,36(%r1)
            000000000015ad2c: a73a0001		ahi	%r3,1
[ 3634.146299] Call Trace:
[ 3634.146301] ([<000000000015ace8>] process_one_work+0x2b4/0x4c0)
[ 3634.146303]  [<000000000015c100>] worker_thread+0x178/0x39c
[ 3634.146305]  [<0000000000164ba6>] kthread+0x10e/0x128
[ 3634.146310]  [<000000000072d026>] kernel_thread_starter+0x6/0xc
[ 3634.146312]  [<000000000072d020>] kernel_thread_starter+0x0/0xc
[ 3634.146313] Last Breaking-Event-Address:
[ 3634.146315]  [<000000000015ad16>] process_one_work+0x2e2/0x4c0
[ 3634.146316] ---[ end trace 03f51c9126c24171 ]---

I don't think this output provides anything new. Please let me know if I 
can gather any more data.

-- 
-- Jason J. Herne (jjherne@...ux.vnet.ibm.com)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ