[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <53875F09.3090607@linux.vnet.ibm.com>
Date: Thu, 29 May 2014 12:23:37 -0400
From: "Jason J. Herne" <jjherne@...ux.vnet.ibm.com>
To: Peter Zijlstra <peterz@...radead.org>
CC: Lai Jiangshan <laijs@...fujitsu.com>,
Sasha Levin <sasha.levin@...cle.com>,
Tejun Heo <tj@...nel.org>, LKML <linux-kernel@...r.kernel.org>,
Dave Jones <davej@...hat.com>, Ingo Molnar <mingo@...hat.com>,
Thomas Gleixner <tglx@...utronix.de>,
Steven Rostedt <rostedt@...dmis.org>
Subject: Re: workqueue: WARN at at kernel/workqueue.c:2176
On 05/27/2014 10:26 AM, Peter Zijlstra wrote:
> On Tue, May 27, 2014 at 10:18:31AM -0400, Jason J. Herne wrote:
>> On 05/16/2014 12:29 PM, Peter Zijlstra wrote:
>>> On Sat, May 17, 2014 at 12:18:06AM +0800, Lai Jiangshan wrote:
>>>> so the scheduler/set_cpus_allowed_ptr()/cpu_active_mask should be the first
>>>> place to fix.
>>>
>>> I'm not arguing about that, not to mention that this is userspace
>>> exposed and nobody protects that.
>>>
>>> But I was expecting kernel stuff that calls it on hotplug to be
>>> serialized thusly, but apparently not so.
>>>
>>
>> Was a final patch posted for this issue? The discussion made it sound like
>> there were still a few things to figure out before we could resolve this
>> bug. I can recreate this as needed and I'm happy to test any patches.
>
>
> https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=sched/urgent&id=6acbfb96976fc3350e30d964acb1dbbdf876d55e
>
> which should make its way to Linus soonish I suppose.
>
I applied the patch on top of c7208164e66f63e3ec1759b98087849286410741
and I am still hitting the problem.
Should I have applied to a different branch/commit to pick up any other
needed changes?
Patch applied:
diff --git a/kernel/cpu.c b/kernel/cpu.c
index a9e710e..247979a 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -726,10 +726,12 @@ void set_cpu_present(unsigned int cpu, bool present)
void set_cpu_online(unsigned int cpu, bool online)
{
- if (online)
+ if (online) {
cpumask_set_cpu(cpu, to_cpumask(cpu_online_bits));
- else
+ cpumask_set_cpu(cpu, to_cpumask(cpu_active_bits));
+ } else {
cpumask_clear_cpu(cpu, to_cpumask(cpu_online_bits));
+ }
}
void set_cpu_active(unsigned int cpu, bool active)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 44e00ab..86f3890 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5076,7 +5076,6 @@ static int sched_cpu_active(struct notifier_block
*nfb,
unsigned long action, void *hcpu)
{
switch (action & ~CPU_TASKS_FROZEN) {
- case CPU_STARTING:
case CPU_DOWN_FAILED:
set_cpu_active((long)hcpu, true);
return NOTIFY_OK;
Here is the output from the recreation using this patch:
[ 3634.146233] ------------[ cut here ]------------
[ 3634.146238] WARNING: at kernel/workqueue.c:2176
[ 3634.146239] Modules linked in: ipt_MASQUERADE iptable_nat nf_nat_ipv4
nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack
xt_CHECKSUM iptable_mangle bridge stp llc ip6table_filter ip6_tables
ebtable_nat ebtables iscsi_tcp libiscsi_tcp libiscsi
scsi_transport_iscsi qeth_l2 tape_3590 tape tape_class vhost_net tun
vhost macvtap macvlan eadm_sch qeth ccwgroup zfcp scsi_transport_fc
scsi_tgt qdio dasd_eckd_mod dasd_mod dm_multipath [last unloaded: kvm]
[ 3634.146260] CPU: 6 PID: 28009 Comm: kworker/7:0 Not tainted 3.15.0-rc7 #1
[ 3634.146263] Workqueue: \xffffff80 (null)
[ 3634.146264] task: 000000025def32e0 ti: 000000026dca0000 task.ti:
000000026dca0000
[ 3634.146266] Krnl PSW : 0404c00180000000 000000000015ad1a
(process_one_work+0x2e6/0x4c0)
[ 3634.146272] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0
PM:0 EA:3
Krnl GPRS: 0000000000000000 0000000000bc649a 00000002764b0980
0000000000b94f40
[ 3634.146275] 0000000000b94f40 0000000000000000
0000000000000000 0000000000bc6496
[ 3634.146277] 0000000000000000 000000008b65b600
000000008b657000 000000008b657018
[ 3634.146278] 00000002764b0980 0000000000b94f40
000000026dca3dd0 000000026dca3d70
[ 3634.146287] Krnl Code: 000000000015ad0e: 95001000 cli 0(%r1),0
000000000015ad12: a774fece brc 7,15aaae
#000000000015ad16: a7f40001 brc 15,15ad18
>000000000015ad1a: 92011000 mvi 0(%r1),1
000000000015ad1e: a7f4fec8 brc 15,15aaae
000000000015ad22: e31003180004 lg %r1,792
000000000015ad28: 58301024 l %r3,36(%r1)
000000000015ad2c: a73a0001 ahi %r3,1
[ 3634.146299] Call Trace:
[ 3634.146301] ([<000000000015ace8>] process_one_work+0x2b4/0x4c0)
[ 3634.146303] [<000000000015c100>] worker_thread+0x178/0x39c
[ 3634.146305] [<0000000000164ba6>] kthread+0x10e/0x128
[ 3634.146310] [<000000000072d026>] kernel_thread_starter+0x6/0xc
[ 3634.146312] [<000000000072d020>] kernel_thread_starter+0x0/0xc
[ 3634.146313] Last Breaking-Event-Address:
[ 3634.146315] [<000000000015ad16>] process_one_work+0x2e2/0x4c0
[ 3634.146316] ---[ end trace 03f51c9126c24171 ]---
I don't think this output provides anything new. Please let me know if I
can gather any more data.
--
-- Jason J. Herne (jjherne@...ux.vnet.ibm.com)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists