lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1431083800-23009-1-git-send-email-songxiumiao@inspur.com>
Date:	Fri,  8 May 2015 19:16:40 +0800
From:	Song Xiumiao <songxiumiao@...pur.com>
To:	tj@...nel.org, linux-kernel@...r.kernel.org
Cc:	yanxiaofeng@...pur.com, fandd@...pur.com, liuchangsheng@...pur.com,
	songxiumiao <songxiumiao@...pur.com>,
	Gong Zhaogang <gongzhaogang@...pur.com>
Subject: [PATCH] Hotplug: fix the bug that the system is down,when memory is not in node0 and cpu is logically hotadded.

From: songxiumiao <songxiumiao@...pur.com>

By analysing the bug function call trace,we find that create_worker
function will alloc the memory from node0.Because node0 is offline,
the allocation is failed. Then we add a condition to ensure the node
is online and system can alloc memory from a node that is online.

Follow is the bug information:
[root@...alhost ~]# echo 1 > /sys/devices/system/cpu/cpu90/online
[  225.611209] smpboot: Booting Node 2 Processor 90 APIC 0x40
[18446744029.482996] kvm: enabling virtualization on CPU90
[  225.725503] TSC synchronization [CPU#43 -> CPU#90]:
[  225.730952] Measured 672516581900 cycles TSC warp between CPUs, turning off TSC clock.
[  225.739800] tsc: Marking TSC unstable due to check_tsc_sync_source failed
[  225.755126] BUG: unable to handle kernel paging request at 0000000000001b08
[  225.762931] IP: [<ffffffff81182597>] __alloc_pages_nodemask+0xb7/0x940
[  225.770247] PGD 449bb0067 PUD 46110e067 PMD 0
[  225.775248] Oops: 0000 [#1] SMP
[  225.778875] Modules linked in: xt_CHECKSUM ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntracd
[  225.868198] CPU: 43 PID: 5400 Comm: bash Not tainted 4.0.0-rc4-bug-fixed-remove #16
[  225.876754] Hardware name: Insyde Brickland/Type2 - Board Product Name1, BIOS Brickland.05.04.15.0024 02/28/2015
[  225.888122] task: ffff88045a3d8da0 ti: ffff880446120000 task.ti: ffff880446120000
[  225.896484] RIP: 0010:[<ffffffff81182597>]  [<ffffffff81182597>] __alloc_pages_nodemask+0xb7/0x940
[  225.906509] RSP: 0018:ffff880446123918  EFLAGS: 00010246
[  225.912443] RAX: 0000000000001b00 RBX: 0000000000000010 RCX: 0000000000000000
[  225.920416] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000002052d0
[  225.928388] RBP: ffff880446123a08 R08: ffff880460eca0c0 R09: 0000000060eca101
[  225.936361] R10: ffff88046d007300 R11: ffffffff8108dd31 R12: 000000000001002a
[  225.944334] R13: 00000000002052d0 R14: 0000000000000001 R15: 00000000000040d0
[  225.952306] FS:  00007f9386450740(0000) GS:ffff88046db60000(0000) knlGS:0000000000000000
[  225.961346] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  225.967765] CR2: 0000000000001b08 CR3: 00000004612a3000 CR4: 00000000001407e0
[  225.975735] Stack:
[  225.977981]  00000000002052d0 0000000000000000 0000000000000003 ffff88045a3d8da0
[  225.986291]  ffff880446123988 ffffffff811c7f81 ffff88045a3d8da0 0000000000000000
[  225.994597]  000080d000000002 ffff88046d005500 000000000003000f 002052d0002052d0
[  226.002904] Call Trace:
[  226.005645]  [<ffffffff811c7f81>] ? alloc_pages_current+0x91/0x100
[  226.012557]  [<ffffffff811d27c3>] ? deactivate_slab+0x383/0x400
[  226.019173]  [<ffffffff811d3957>] new_slab+0xa7/0x460
[  226.024826]  [<ffffffff81678c75>] __slab_alloc+0x310/0x470
[  226.030960]  [<ffffffff8130caf6>] ? get_from_free_list+0x46/0x60
[  226.037679]  [<ffffffff8108dd31>] ? alloc_worker+0x21/0x50
[  226.043812]  [<ffffffff811d46c1>] kmem_cache_alloc_node_trace+0x91/0x250
[  226.051299]  [<ffffffff8108dd31>] alloc_worker+0x21/0x50
[  226.057236]  [<ffffffff8108ff23>] create_worker+0x53/0x1e0
[  226.063357]  [<ffffffff81092092>] alloc_unbound_pwq+0x2a2/0x510
[  226.069974]  [<ffffffff810924b4>] wq_update_unbound_numa+0x1b4/0x220
[  226.077076]  [<ffffffff81092828>] workqueue_cpu_up_callback+0x308/0x3d0
[  226.084468]  [<ffffffff8109784e>] notifier_call_chain+0x4e/0x80
[  226.091084]  [<ffffffff8109796e>] __raw_notifier_call_chain+0xe/0x10
[  226.098189]  [<ffffffff810774f3>] cpu_notify+0x23/0x50
[  226.103929]  [<ffffffff81077878>] _cpu_up+0x188/0x1a0
[  226.109574]  [<ffffffff81077919>] cpu_up+0x89/0xb0
[  226.114923]  [<ffffffff8166fba0>] cpu_subsys_online+0x40/0x90
[  226.121350]  [<ffffffff814386dd>] device_online+0x6d/0xa0
[  226.127382]  [<ffffffff814387a5>] online_store+0x95/0xa0
[  226.133322]  [<ffffffff814358a8>] dev_attr_store+0x18/0x30
[  226.139457]  [<ffffffff8126d76d>] sysfs_kf_write+0x3d/0x50
[  226.145586]  [<ffffffff8126cc1a>] kernfs_fop_write+0x12a/0x180
[  226.152109]  [<ffffffff811f1bb7>] vfs_write+0xb7/0x1f0
[  226.157853]  [<ffffffff810232bc>] ? do_audit_syscall_entry+0x6c/0x70
[  226.164954]  [<ffffffff811f2835>] SyS_write+0x55/0xd0
[  226.170595]  [<ffffffff81681f09>] system_call_fastpath+0x12/0x17
[  226.177306] Code: 30 97 00 89 45 bc 83 e1 0f b8 22 01 32 01 01 c9 d3 f8 83 e0 03 89 9d 6c ff ff ff 83 e3 10 89 45 c0 0f 85 6d 01 00 00 48 8b 45 88 <48> 83 78 08 00 0f 84 51 01 00 00 b8 01
[  226.199175] RIP  [<ffffffff81182597>] __alloc_pages_nodemask+0xb7/0x940
[  226.206576]  RSP <ffff880446123918>
[  226.210471] CR2: 0000000000001b08
[  226.227939] ---[ end trace 30d753e1e1124696 ]---
[  226.412591] Kernel panic - not syncing: Fatal exception
[  226.430948] Kernel Offset: disabled
[  226.434845] drm_kms_helper: panic occurred, switching back to text console
[  226.618325] ---[ end Kernel panic - not syncing: Fatal exception
[  226.625047] ------------[ cut here ]------------
[  226.630213] WARNING: CPU: 43 PID: 5400 at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x5d/0x60()
[  226.640999] Modules linked in: xt_CHECKSUM ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntracd
[  226.730275] CPU: 43 PID: 5400 Comm: bash Tainted: G      D         4.0.0-rc4-bug-fixed-remove #16
[  226.740189] Hardware name: Insyde Brickland/Type2 - Board Product Name1, BIOS Brickland.05.04.15.0024 02/28/2015
[  226.751558]  0000000000000000 00000000aa535e80 ffff88046db63d58 ffffffff8167aa08
[  226.759865]  0000000000000000 0000000000000000 ffff88046db63d98 ffffffff810772da
[  226.768173]  ffff88046db63d98 0000000000000000 ffff88046d615380 000000000000002b
[  226.776480] Call Trace:
[  226.779212]  <IRQ>  [<ffffffff8167aa08>] dump_stack+0x45/0x57
[  226.785657]  [<ffffffff810772da>] warn_slowpath_common+0x8a/0xc0
[  226.792367]  [<ffffffff8107740a>] warn_slowpath_null+0x1a/0x20
[  226.798886]  [<ffffffff8104a64d>] native_smp_send_reschedule+0x5d/0x60
[  226.806182]  [<ffffffff810b4fe5>] trigger_load_balance+0x145/0x1b0
[  226.813093]  [<ffffffff810a348c>] scheduler_tick+0x9c/0xe0
[  226.819228]  [<ffffffff810e0a21>] update_process_times+0x51/0x60
[  226.825946]  [<ffffffff810f0925>] tick_sched_handle.isra.18+0x25/0x60
[  226.833143]  [<ffffffff810f09a4>] tick_sched_timer+0x44/0x80
[  226.839467]  [<ffffffff810e1737>] __run_hrtimer+0x77/0x1d0
[  226.845590]  [<ffffffff810f0960>] ? tick_sched_handle.isra.18+0x60/0x60
[  226.852980]  [<ffffffff810e1b13>] hrtimer_interrupt+0x103/0x230
[  226.859596]  [<ffffffff8104d3d9>] local_apic_timer_interrupt+0x39/0x60
[  226.866883]  [<ffffffff81684d85>] smp_apic_timer_interrupt+0x45/0x60
[  226.873982]  [<ffffffff81682ded>] apic_timer_interrupt+0x6d/0x80
[  226.880690]  <EOI>  [<ffffffff81675abe>] ? panic+0x1c3/0x204
[  226.887036]  [<ffffffff81675ab7>] ? panic+0x1bc/0x204
[  226.892682]  [<ffffffff81018949>] oops_end+0x109/0x120
[  226.898422]  [<ffffffff81675285>] no_context+0x2ee/0x366
[  226.904359]  [<ffffffff81675370>] __bad_area_nosemaphore+0x73/0x1cc
[  226.911361]  [<ffffffff816756ae>] bad_area+0x44/0x4c
[  226.916910]  [<ffffffff81062b1a>] __do_page_fault+0x2ea/0x420
[  226.923331]  [<ffffffff81062c81>] do_page_fault+0x31/0x70
[  226.929364]  [<ffffffff81683f08>] page_fault+0x28/0x30
[  226.935106]  [<ffffffff8108dd31>] ? alloc_worker+0x21/0x50
[  226.941235]  [<ffffffff81182597>] ? __alloc_pages_nodemask+0xb7/0x940
[  226.948430]  [<ffffffff81182705>] ? __alloc_pages_nodemask+0x225/0x940
[  226.955725]  [<ffffffff811c7f81>] ? alloc_pages_current+0x91/0x100
[  226.962624]  [<ffffffff811d27c3>] ? deactivate_slab+0x383/0x400
[  226.969239]  [<ffffffff811d3957>] new_slab+0xa7/0x460
[  226.974885]  [<ffffffff81678c75>] __slab_alloc+0x310/0x470
[  226.981015]  [<ffffffff8130caf6>] ? get_from_free_list+0x46/0x60
[  226.987727]  [<ffffffff8108dd31>] ? alloc_worker+0x21/0x50
[  226.993851]  [<ffffffff811d46c1>] kmem_cache_alloc_node_trace+0x91/0x250
[  227.001340]  [<ffffffff8108dd31>] alloc_worker+0x21/0x50
[  227.007275]  [<ffffffff8108ff23>] create_worker+0x53/0x1e0
[  227.013404]  [<ffffffff81092092>] alloc_unbound_pwq+0x2a2/0x510
[  227.020019]  [<ffffffff810924b4>] wq_update_unbound_numa+0x1b4/0x220
[  227.027112]  [<ffffffff81092828>] workqueue_cpu_up_callback+0x308/0x3d0
[  227.034502]  [<ffffffff8109784e>] notifier_call_chain+0x4e/0x80
[  227.041117]  [<ffffffff8109796e>] __raw_notifier_call_chain+0xe/0x10
[  227.048219]  [<ffffffff810774f3>] cpu_notify+0x23/0x50
[  227.053961]  [<ffffffff81077878>] _cpu_up+0x188/0x1a0
[  227.059597]  [<ffffffff81077919>] cpu_up+0x89/0xb0
[  227.064950]  [<ffffffff8166fba0>] cpu_subsys_online+0x40/0x90
[  227.071372]  [<ffffffff814386dd>] device_online+0x6d/0xa0
[  227.077395]  [<ffffffff814387a5>] online_store+0x95/0xa0
[  227.083332]  [<ffffffff814358a8>] dev_attr_store+0x18/0x30
[  227.089460]  [<ffffffff8126d76d>] sysfs_kf_write+0x3d/0x50
[  227.095589]  [<ffffffff8126cc1a>] kernfs_fop_write+0x12a/0x180
[  227.102108]  [<ffffffff811f1bb7>] vfs_write+0xb7/0x1f0
[  227.107850]  [<ffffffff810232bc>] ? do_audit_syscall_entry+0x6c/0x70
[  227.114950]  [<ffffffff811f2835>] SyS_write+0x55/0xd0
[  227.120595]  [<ffffffff81681f09>] system_call_fastpath+0x12/0x17
[  227.127306] ---[ end trace 30d753e1e1124697 ]---

Signed-off-by: Song Xiumiao <songxiumiao@...pur.com>
Signed-off-by: Gong Zhaogang <gongzhaogang@...pur.com>
Tested-by: Liu Changsheng <liuchangsheng@...pur.com>
Reviewed-by: xiaofeng.yan <xiaofeng.yan@...pur.com>
Reviewed-by: Fan Dongdong <fandd@...pur.com>
---
 kernel/workqueue.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 586ad91..cae6277 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3253,7 +3253,8 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
 	if (wq_numa_enabled) {
 		for_each_node(node) {
 			if (cpumask_subset(pool->attrs->cpumask,
-					   wq_numa_possible_cpumask[node])) {
+					   wq_numa_possible_cpumask[node]) &&
+					   node_online(node)) {
 				pool->node = node;
 				break;
 			}
-- 
1.9.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ