lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <daa0ad53-4561-98d6-7a9f-c36c3a0659a8@I-love.SAKURA.ne.jp>
Date:   Tue, 7 Feb 2017 20:24:13 +0900
From:   Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
To:     Mel Gorman <mgorman@...hsingularity.net>,
        Dmitry Vyukov <dvyukov@...gle.com>
Cc:     Vlastimil Babka <vbabka@...e.cz>, Tejun Heo <tj@...nel.org>,
        Christoph Lameter <cl@...ux.com>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        syzkaller <syzkaller@...glegroups.com>,
        Michal Hocko <mhocko@...e.cz>,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: mm: deadlock between get_online_cpus/pcpu_alloc

On 2017/02/07 7:05, Mel Gorman wrote:
> On Mon, Feb 06, 2017 at 08:13:35PM +0100, Dmitry Vyukov wrote:
>> On Mon, Jan 30, 2017 at 4:48 PM, Dmitry Vyukov <dvyukov@...gle.com> wrote:
>>> On Sun, Jan 29, 2017 at 6:22 PM, Vlastimil Babka <vbabka@...e.cz> wrote:
>>>> On 29.1.2017 13:44, Dmitry Vyukov wrote:
>>>>> Hello,
>>>>>
>>>>> I've got the following deadlock report while running syzkaller fuzzer
>>>>> on f37208bc3c9c2f811460ef264909dfbc7f605a60:
>>>>>
>>>>> [ INFO: possible circular locking dependency detected ]
>>>>> 4.10.0-rc5-next-20170125 #1 Not tainted
>>>>> -------------------------------------------------------
>>>>> syz-executor3/14255 is trying to acquire lock:
>>>>>  (cpu_hotplug.dep_map){++++++}, at: [<ffffffff814271c7>]
>>>>> get_online_cpus+0x37/0x90 kernel/cpu.c:239
>>>>>
>>>>> but task is already holding lock:
>>>>>  (pcpu_alloc_mutex){+.+.+.}, at: [<ffffffff81937fee>]
>>>>> pcpu_alloc+0xbfe/0x1290 mm/percpu.c:897
>>>>>
>>>>> which lock already depends on the new lock.
>>>>
>>>> I suspect the dependency comes from recent changes in drain_all_pages(). They
>>>> were later redone (for other reasons, but nice to have another validation) in
>>>> the mmots patch [1], which AFAICS is not yet in mmotm and thus linux-next. Could
>>>> you try if it helps?
>>>
>>> It happened only once on linux-next, so I can't verify the fix. But I
>>> will watch out for other occurrences.
>>
>> Unfortunately it does not seem to help.
> 
> I'm a little stuck on how to best handle this. get_online_cpus() can
> halt forever if the hotplug operation is holding the mutex when calling
> pcpu_alloc. One option would be to add a try_get_online_cpus() helper which
> trylocks the mutex. However, given that drain is so unlikely to actually
> make that make a difference when racing against parallel allocations,
> I think this should be acceptable.
> 
> Any objections?

Why below change on top of current linux.git (8b1b41ee74f9) insufficient?
I think it can eliminate IPIs a lot without introducing lockdep warnings.

----------
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f3e0c69..ae6e7aa 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2354,6 +2354,7 @@ void drain_local_pages(struct zone *zone)
  */
 void drain_all_pages(struct zone *zone)
 {
+	static DEFINE_MUTEX(lock);
 	int cpu;
 
 	/*
@@ -2362,6 +2363,7 @@ void drain_all_pages(struct zone *zone)
 	 */
 	static cpumask_t cpus_with_pcps;
 
+	mutex_lock(&lock);
 	/*
 	 * We don't care about racing with CPU hotplug event
 	 * as offline notification will cause the notified
@@ -2394,6 +2396,7 @@ void drain_all_pages(struct zone *zone)
 	}
 	on_each_cpu_mask(&cpus_with_pcps, (smp_call_func_t) drain_local_pages,
 								zone, 1);
+	mutex_unlock(&lock);
 }
 
 #ifdef CONFIG_HIBERNATION
----------

By the way, drain_all_pages() is a sleepable context, isn't it?
I don't get get soft lockup using current linux.git (8b1b41ee74f9).
But I trivially get soft lockup if I try above change after reverting
"mm, page_alloc: use static global work_struct for draining per-cpu pages" and
"mm, page_alloc: drain per-cpu pages from workqueue context" on linux-next-20170207 .
List corruption also came in with linux-next-20170207 .

----------
[   32.672890] ip6_tables: (C) 2000-2006 Netfilter Core Team
[   33.109860] Ebtables v2.0 registered
[   33.410293] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
[   33.935512] IPv6: ADDRCONF(NETDEV_UP): eno16777728: link is not ready
[   33.937478] e1000: eno16777728 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[   33.939777] IPv6: ADDRCONF(NETDEV_CHANGE): eno16777728: link becomes ready
[   34.194000] Netfilter messages via NETLINK v0.30.
[   34.258828] ip_set: protocol 6
[   38.518375] nf_conntrack: default automatic helper assignment has been turned off for security reasons and CT-based  firewall rule not found. Use the iptables CT target to attach helpers instead.
[   43.911609] cp (5167) used greatest stack depth: 10488 bytes left
[   48.174125] cp (5860) used greatest stack depth: 10224 bytes left
[  101.075280] ------------[ cut here ]------------
[  101.076743] WARNING: CPU: 0 PID: 9766 at lib/list_debug.c:25 __list_add_valid+0x46/0xa0
[  101.079095] list_add corruption. next->prev should be prev (ffffea00013dfce0), but was ffff88006d3e1d40. (next=ffff88006d3e1d40).
[  101.081863] Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd ppdev glue_helper vmw_balloon pcspkr sg parport_pc parport i2c_piix4 shpchp vmw_vsock_vmci_transport vsock vmw_vmci ip_tables xfs libcrc32c sd_mod sr_mod cdrom ata_generic pata_acpi crc32c_intel vmwgfx serio_raw drm_kms_helper syscopyarea sysfillrect
[  101.098346]  sysimgblt fb_sys_fops ttm mptspi scsi_transport_spi ata_piix mptscsih ahci drm libahci mptbase libata e1000 i2c_core
[  101.101279] CPU: 0 PID: 9766 Comm: oom-write Tainted: G        W       4.10.0-rc7-next-20170207+ #500
[  101.103519] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[  101.106031] Call Trace:
[  101.107069]  dump_stack+0x85/0xc9
[  101.108239]  __warn+0xd1/0xf0
[  101.109566]  warn_slowpath_fmt+0x5f/0x80
[  101.110843]  __list_add_valid+0x46/0xa0
[  101.112187]  free_hot_cold_page+0x205/0x460
[  101.113593]  free_hot_cold_page_list+0x3c/0x1c0
[  101.115046]  shrink_page_list+0x4dd/0xd10
[  101.116388]  shrink_inactive_list+0x1c5/0x660
[  101.117796]  shrink_node_memcg+0x535/0x7f0
[  101.119158]  ? mem_cgroup_iter+0x1e0/0x720
[  101.120873]  shrink_node+0xe1/0x310
[  101.122250]  do_try_to_free_pages+0xe1/0x300
[  101.123611]  try_to_free_pages+0x131/0x3f0
[  101.124998]  __alloc_pages_slowpath+0x479/0xe32
[  101.126421]  __alloc_pages_nodemask+0x382/0x3d0
[  101.128053]  alloc_pages_vma+0xae/0x2f0
[  101.129353]  do_anonymous_page+0x111/0x5d0
[  101.130725]  __handle_mm_fault+0xbc9/0xeb0
[  101.132127]  ? sched_clock+0x9/0x10
[  101.133389]  ? sched_clock_cpu+0x11/0xc0
[  101.134764]  handle_mm_fault+0x16b/0x390
[  101.136147]  ? handle_mm_fault+0x49/0x390
[  101.137545]  __do_page_fault+0x24a/0x530
[  101.138999]  do_page_fault+0x30/0x80
[  101.140375]  page_fault+0x28/0x30
[  101.141685] RIP: 0033:0x4006a0
[  101.142880] RSP: 002b:00007ffe3a4acc10 EFLAGS: 00010206
[  101.144638] RAX: 0000000037f8e000 RBX: 0000000080000000 RCX: 00007fc5a1fda650
[  101.146653] RDX: 0000000000000000 RSI: 00007ffe3a4aca30 RDI: 00007ffe3a4aca30
[  101.148768] RBP: 00007fc4a2111010 R08: 00007ffe3a4acb40 R09: 00007ffe3a4ac980
[  101.150893] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000007
[  101.152937] R13: 00007fc4a2111010 R14: 0000000000000000 R15: 0000000000000000
[  101.155240] ---[ end trace 5d8b63572ab78be3 ]---
[  128.945470] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [oom-write:9766]
[  128.947878] Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd ppdev glue_helper vmw_balloon pcspkr sg parport_pc parport i2c_piix4 shpchp vmw_vsock_vmci_transport vsock vmw_vmci ip_tables xfs libcrc32c sd_mod sr_mod cdrom ata_generic pata_acpi crc32c_intel vmwgfx serio_raw drm_kms_helper syscopyarea sysfillrect
[  128.966961]  sysimgblt fb_sys_fops ttm mptspi scsi_transport_spi ata_piix mptscsih ahci drm libahci mptbase libata e1000 i2c_core
[  128.970135] irq event stamp: 2491700
[  128.971666] hardirqs last  enabled at (2491699): [<ffffffff817e5d70>] restore_regs_and_iret+0x0/0x1d
[  128.974560] hardirqs last disabled at (2491700): [<ffffffff817e7198>] apic_timer_interrupt+0x98/0xb0
[  128.977148] softirqs last  enabled at (2445236): [<ffffffff817eab39>] __do_softirq+0x349/0x52d
[  128.979639] softirqs last disabled at (2445215): [<ffffffff810a99e5>] irq_exit+0xf5/0x110
[  128.982019] CPU: 2 PID: 9766 Comm: oom-write Tainted: G        W       4.10.0-rc7-next-20170207+ #500
[  128.984598] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[  128.987490] task: ffff880067ed8040 task.stack: ffffc9001117c000
[  128.989464] RIP: 0010:smp_call_function_many+0x25c/0x320
[  128.991305] RSP: 0000:ffffc9001117fad0 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff10
[  128.993609] RAX: 0000000000000000 RBX: ffff88006d7dd640 RCX: 0000000000000001
[  128.995823] RDX: 0000000000000001 RSI: ffff88006d3e3398 RDI: ffff88006c528dc8
[  128.998058] RBP: ffffc9001117fb18 R08: 0000000000000009 R09: 0000000000000000
[  129.000284] R10: 0000000000000001 R11: ffff88005486d4a8 R12: 0000000000000000
[  129.002521] R13: ffffffff811fe6e0 R14: 0000000000000000 R15: 0000000000000080
[  129.004788] FS:  00007fc5a24ec740(0000) GS:ffff88006d600000(0000) knlGS:0000000000000000
[  129.007235] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  129.009211] CR2: 00007fc4da0c5010 CR3: 0000000044314000 CR4: 00000000001406e0
[  129.011641] Call Trace:
[  129.012993]  ? page_alloc_cpu_dead+0x30/0x30
[  129.014701]  on_each_cpu_mask+0x30/0xb0
[  129.016318]  drain_all_pages+0x113/0x170
[  129.017949]  __alloc_pages_slowpath+0x520/0xe32
[  129.019710]  __alloc_pages_nodemask+0x382/0x3d0
[  129.021472]  alloc_pages_vma+0xae/0x2f0
[  129.023080]  do_anonymous_page+0x111/0x5d0
[  129.024760]  __handle_mm_fault+0xbc9/0xeb0
[  129.026449]  ? sched_clock+0x9/0x10
[  129.028042]  ? sched_clock_cpu+0x11/0xc0
[  129.029729]  handle_mm_fault+0x16b/0x390
[  129.031418]  ? handle_mm_fault+0x49/0x390
[  129.033077]  __do_page_fault+0x24a/0x530
[  129.034721]  do_page_fault+0x30/0x80
[  129.036279]  page_fault+0x28/0x30
[  129.038160] RIP: 0033:0x4006a0
[  129.039600] RSP: 002b:00007ffe3a4acc10 EFLAGS: 00010206
[  129.041488] RAX: 0000000037fb4000 RBX: 0000000080000000 RCX: 00007fc5a1fda650
[  129.043758] RDX: 0000000000000000 RSI: 00007ffe3a4aca30 RDI: 00007ffe3a4aca30
[  129.046037] RBP: 00007fc4a2111010 R08: 00007ffe3a4acb40 R09: 00007ffe3a4ac980
[  129.048324] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000007
[  129.050587] R13: 00007fc4a2111010 R14: 0000000000000000 R15: 0000000000000000
[  129.052829] Code: 7b 3e 2b 00 3b 05 69 b6 d5 00 41 89 c4 0f 8d 3f fe ff ff 48 63 d0 48 8b 33 48 03 34 d5 60 c4 ab 81 8b 56 18 83 e2 01 74 0a f3 90 <8b> 4e 18 83 e1 01 75 f6 83 f8 ff 48 8b 7b 08 74 2a 48 63 35 30 
[  156.945366] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [oom-write:9766]
[  156.947797] Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd ppdev glue_helper vmw_balloon pcspkr sg parport_pc parport i2c_piix4 shpchp vmw_vsock_vmci_transport vsock vmw_vmci ip_tables xfs libcrc32c sd_mod sr_mod cdrom ata_generic pata_acpi crc32c_intel vmwgfx serio_raw drm_kms_helper syscopyarea sysfillrect
[  156.968021]  sysimgblt fb_sys_fops ttm mptspi scsi_transport_spi ata_piix mptscsih ahci drm libahci mptbase libata e1000 i2c_core
[  156.971269] irq event stamp: 2547400
[  156.972865] hardirqs last  enabled at (2547399): [<ffffffff817e5d70>] restore_regs_and_iret+0x0/0x1d
[  156.975579] hardirqs last disabled at (2547400): [<ffffffff817e7198>] apic_timer_interrupt+0x98/0xb0
[  156.978287] softirqs last  enabled at (2445236): [<ffffffff817eab39>] __do_softirq+0x349/0x52d
[  156.980863] softirqs last disabled at (2445215): [<ffffffff810a99e5>] irq_exit+0xf5/0x110
[  156.983393] CPU: 2 PID: 9766 Comm: oom-write Tainted: G        W    L  4.10.0-rc7-next-20170207+ #500
[  156.986111] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[  156.989117] task: ffff880067ed8040 task.stack: ffffc9001117c000
[  156.991179] RIP: 0010:smp_call_function_many+0x25c/0x320
[  156.993113] RSP: 0000:ffffc9001117fad0 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff10
[  156.995501] RAX: 0000000000000000 RBX: ffff88006d7dd640 RCX: 0000000000000001
[  156.997803] RDX: 0000000000000001 RSI: ffff88006d3e3398 RDI: ffff88006c528dc8
[  157.000111] RBP: ffffc9001117fb18 R08: 0000000000000009 R09: 0000000000000000
[  157.002402] R10: 0000000000000001 R11: ffff88005486d4a8 R12: 0000000000000000
[  157.004700] R13: ffffffff811fe6e0 R14: 0000000000000000 R15: 0000000000000080
[  157.006980] FS:  00007fc5a24ec740(0000) GS:ffff88006d600000(0000) knlGS:0000000000000000
[  157.009477] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  157.011494] CR2: 00007fc4da0c5010 CR3: 0000000044314000 CR4: 00000000001406e0
[  157.013844] Call Trace:
[  157.015231]  ? page_alloc_cpu_dead+0x30/0x30
[  157.016975]  on_each_cpu_mask+0x30/0xb0
[  157.018723]  drain_all_pages+0x113/0x170
[  157.020410]  __alloc_pages_slowpath+0x520/0xe32
[  157.022219]  __alloc_pages_nodemask+0x382/0x3d0
[  157.024036]  alloc_pages_vma+0xae/0x2f0
[  157.025704]  do_anonymous_page+0x111/0x5d0
[  157.027430]  __handle_mm_fault+0xbc9/0xeb0
[  157.029143]  ? sched_clock+0x9/0x10
[  157.030739]  ? sched_clock_cpu+0x11/0xc0
[  157.032416]  handle_mm_fault+0x16b/0x390
[  157.034100]  ? handle_mm_fault+0x49/0x390
[  157.035843]  __do_page_fault+0x24a/0x530
[  157.037537]  do_page_fault+0x30/0x80
[  157.039160]  page_fault+0x28/0x30
[  157.040792] RIP: 0033:0x4006a0
[  157.042290] RSP: 002b:00007ffe3a4acc10 EFLAGS: 00010206
[  157.044231] RAX: 0000000037fb4000 RBX: 0000000080000000 RCX: 00007fc5a1fda650
[  157.046635] RDX: 0000000000000000 RSI: 00007ffe3a4aca30 RDI: 00007ffe3a4aca30
[  157.048941] RBP: 00007fc4a2111010 R08: 00007ffe3a4acb40 R09: 00007ffe3a4ac980
[  157.051324] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000007
[  157.053577] R13: 00007fc4a2111010 R14: 0000000000000000 R15: 0000000000000000
[  157.055822] Code: 7b 3e 2b 00 3b 05 69 b6 d5 00 41 89 c4 0f 8d 3f fe ff ff 48 63 d0 48 8b 33 48 03 34 d5 60 c4 ab 81 8b 56 18 83 e2 01 74 0a f3 90 <8b> 4e 18 83 e1 01 75 f6 83 f8 ff 48 8b 7b 08 74 2a 48 63 35 30 
[  171.241423] BUG: spinlock lockup suspected on CPU#3, swapper/3/0
[  171.243575]  lock: 0xffff88007ffddd00, .magic: dead4ead, .owner: kworker/0:3/9777, .owner_cpu: 0
[  171.246136] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G        W    L  4.10.0-rc7-next-20170207+ #500
[  171.248706] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[  171.251617] Call Trace:
[  171.252854]  <IRQ>
[  171.253989]  dump_stack+0x85/0xc9
[  171.255376]  spin_dump+0x90/0x95
[  171.256741]  do_raw_spin_lock+0x9a/0x130
[  171.258246]  _raw_spin_lock_irqsave+0x75/0x90
[  171.259822]  ? free_pcppages_bulk+0x37/0x910
[  171.261359]  free_pcppages_bulk+0x37/0x910
[  171.262843]  ? sched_clock_cpu+0x11/0xc0
[  171.264289]  ? sched_clock_tick+0x2d/0x80
[  171.265747]  drain_pages_zone+0x82/0x90
[  171.267154]  ? page_alloc_cpu_dead+0x30/0x30
[  171.268644]  drain_pages+0x3f/0x60
[  171.269955]  drain_local_pages+0x25/0x30
[  171.271366]  flush_smp_call_function_queue+0x7b/0x170
[  171.273014]  generic_smp_call_function_single_interrupt+0x13/0x30
[  171.274878]  smp_call_function_interrupt+0x27/0x40
[  171.276460]  call_function_interrupt+0x9d/0xb0
[  171.277963] RIP: 0010:native_safe_halt+0x6/0x10
[  171.279467] RSP: 0018:ffffc900003a3e70 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff03
[  171.281610] RAX: ffff88006c5c0040 RBX: 0000000000000000 RCX: 0000000000000000
[  171.283662] RDX: ffff88006c5c0040 RSI: 0000000000000001 RDI: ffff88006c5c0040
[  171.285727] RBP: ffffc900003a3e70 R08: 0000000000000000 R09: 0000000000000000
[  171.287778] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000003
[  171.289831] R13: ffff88006c5c0040 R14: ffff88006c5c0040 R15: 0000000000000000
[  171.291882]  </IRQ>
[  171.292934]  default_idle+0x23/0x1d0
[  171.294287]  arch_cpu_idle+0xf/0x20
[  171.295618]  default_idle_call+0x23/0x40
[  171.297040]  do_idle+0x162/0x230
[  171.298308]  cpu_startup_entry+0x71/0x80
[  171.299732]  start_secondary+0x17f/0x1f0
[  171.301146]  start_cpu+0x14/0x14
[  171.302432] NMI backtrace for cpu 3
[  171.303764] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G        W    L  4.10.0-rc7-next-20170207+ #500
[  171.306218] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[  171.309065] Call Trace:
[  171.310500]  <IRQ>
[  171.311569]  dump_stack+0x85/0xc9
[  171.312946]  nmi_cpu_backtrace+0xc0/0xe0
[  171.314396]  ? irq_force_complete_move+0x170/0x170
[  171.316012]  nmi_trigger_cpumask_backtrace+0x12a/0x188
[  171.317679]  arch_trigger_cpumask_backtrace+0x19/0x20
[  171.319294]  do_raw_spin_lock+0xa8/0x130
[  171.320686]  _raw_spin_lock_irqsave+0x75/0x90
[  171.322148]  ? free_pcppages_bulk+0x37/0x910
[  171.323585]  free_pcppages_bulk+0x37/0x910
[  171.325037]  ? sched_clock_cpu+0x11/0xc0
[  171.326615]  ? sched_clock_tick+0x2d/0x80
[  171.327974]  drain_pages_zone+0x82/0x90
[  171.329294]  ? page_alloc_cpu_dead+0x30/0x30
[  171.330707]  drain_pages+0x3f/0x60
[  171.331938]  drain_local_pages+0x25/0x30
[  171.333283]  flush_smp_call_function_queue+0x7b/0x170
[  171.334855]  generic_smp_call_function_single_interrupt+0x13/0x30
[  171.336643]  smp_call_function_interrupt+0x27/0x40
[  171.338456]  call_function_interrupt+0x9d/0xb0
[  171.339911] RIP: 0010:native_safe_halt+0x6/0x10
[  171.341522] RSP: 0018:ffffc900003a3e70 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff03
[  171.343860] RAX: ffff88006c5c0040 RBX: 0000000000000000 RCX: 0000000000000000
[  171.346001] RDX: ffff88006c5c0040 RSI: 0000000000000001 RDI: ffff88006c5c0040
[  171.348134] RBP: ffffc900003a3e70 R08: 0000000000000000 R09: 0000000000000000
[  171.350270] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000003
[  171.352384] R13: ffff88006c5c0040 R14: ffff88006c5c0040 R15: 0000000000000000
[  171.354503]  </IRQ>
[  171.355616]  default_idle+0x23/0x1d0
[  171.357055]  arch_cpu_idle+0xf/0x20
[  171.361577]  default_idle_call+0x23/0x40
[  171.364329]  do_idle+0x162/0x230
[  171.365799]  cpu_startup_entry+0x71/0x80
[  171.367172]  start_secondary+0x17f/0x1f0
[  171.368530]  start_cpu+0x14/0x14
[  171.369748] Sending NMI from CPU 3 to CPUs 0-2:
[  171.371367] NMI backtrace for cpu 2
[  171.371368] CPU: 2 PID: 9766 Comm: oom-write Tainted: G        W    L  4.10.0-rc7-next-20170207+ #500
[  171.371368] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[  171.371369] task: ffff880067ed8040 task.stack: ffffc9001117c000
[  171.371369] RIP: 0010:smp_call_function_many+0x25c/0x320
[  171.371370] RSP: 0000:ffffc9001117fad0 EFLAGS: 00000202
[  171.371371] RAX: 0000000000000000 RBX: ffff88006d7dd640 RCX: 0000000000000001
[  171.371372] RDX: 0000000000000001 RSI: ffff88006d3e3398 RDI: ffff88006c528dc8
[  171.371372] RBP: ffffc9001117fb18 R08: 0000000000000009 R09: 0000000000000000
[  171.371372] R10: 0000000000000001 R11: ffff88005486d4a8 R12: 0000000000000000
[  171.371373] R13: ffffffff811fe6e0 R14: 0000000000000000 R15: 0000000000000080
[  171.371373] FS:  00007fc5a24ec740(0000) GS:ffff88006d600000(0000) knlGS:0000000000000000
[  171.371374] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  171.371374] CR2: 00007fc4da0c5010 CR3: 0000000044314000 CR4: 00000000001406e0
[  171.371375] Call Trace:
[  171.371375]  ? page_alloc_cpu_dead+0x30/0x30
[  171.371375]  on_each_cpu_mask+0x30/0xb0
[  171.371376]  drain_all_pages+0x113/0x170
[  171.371376]  __alloc_pages_slowpath+0x520/0xe32
[  171.371377]  __alloc_pages_nodemask+0x382/0x3d0
[  171.371377]  alloc_pages_vma+0xae/0x2f0
[  171.371377]  do_anonymous_page+0x111/0x5d0
[  171.371378]  __handle_mm_fault+0xbc9/0xeb0
[  171.371378]  ? sched_clock+0x9/0x10
[  171.371379]  ? sched_clock_cpu+0x11/0xc0
[  171.371379]  handle_mm_fault+0x16b/0x390
[  171.371379]  ? handle_mm_fault+0x49/0x390
[  171.371380]  __do_page_fault+0x24a/0x530
[  171.371380]  do_page_fault+0x30/0x80
[  171.371381]  page_fault+0x28/0x30
[  171.371381] RIP: 0033:0x4006a0
[  171.371381] RSP: 002b:00007ffe3a4acc10 EFLAGS: 00010206
[  171.371382] RAX: 0000000037fb4000 RBX: 0000000080000000 RCX: 00007fc5a1fda650
[  171.371383] RDX: 0000000000000000 RSI: 00007ffe3a4aca30 RDI: 00007ffe3a4aca30
[  171.371383] RBP: 00007fc4a2111010 R08: 00007ffe3a4acb40 R09: 00007ffe3a4ac980
[  171.371384] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000007
[  171.371384] R13: 00007fc4a2111010 R14: 0000000000000000 R15: 0000000000000000
[  171.371385] Code: 7b 3e 2b 00 3b 05 69 b6 d5 00 41 89 c4 0f 8d 3f fe ff ff 48 63 d0 48 8b 33 48 03 34 d5 60 c4 ab 81 8b 56 18 83 e2 01 74 0a f3 90 <8b> 4e 18 83 e1 01 75 f6 83 f8 ff 48 8b 7b 08 74 2a 48 63 35 30 
[  171.372317] NMI backtrace for cpu 0
[  171.372317] CPU: 0 PID: 9777 Comm: kworker/0:3 Tainted: G        W    L  4.10.0-rc7-next-20170207+ #500
[  171.372318] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[  171.372318] Workqueue: events vmw_fb_dirty_flush [vmwgfx]
[  171.372319] task: ffff880064082540 task.stack: ffffc90013b54000
[  171.372320] RIP: 0010:free_pcppages_bulk+0xbb/0x910
[  171.372320] RSP: 0000:ffff88006d203eb0 EFLAGS: 00000002
[  171.372321] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000010
[  171.372322] RDX: 0000000000000000 RSI: 00000000357cc759 RDI: ffff88006d3e1d50
[  171.372322] RBP: ffff88006d203f38 R08: ffff88006d3e1d20 R09: 0000000000000002
[  171.372323] R10: 0000000000000000 R11: 000000000004f7f0 R12: ffff88007ffdd8f8
[  171.372323] R13: ffffea00013dfc20 R14: ffffea00013dfc00 R15: ffff88007ffdd740
[  171.372324] FS:  0000000000000000(0000) GS:ffff88006d200000(0000) knlGS:0000000000000000
[  171.372324] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  171.372325] CR2: 00007fc4da09f010 CR3: 0000000044314000 CR4: 00000000001406f0
[  171.372325] Call Trace:
[  171.372325]  <IRQ>
[  171.372326]  ? trace_hardirqs_off+0xd/0x10
[  171.372326]  drain_pages_zone+0x82/0x90
[  171.372327]  ? page_alloc_cpu_dead+0x30/0x30
[  171.372327]  drain_pages+0x3f/0x60
[  171.372327]  drain_local_pages+0x25/0x30
[  171.372328]  flush_smp_call_function_queue+0x7b/0x170
[  171.372328]  generic_smp_call_function_single_interrupt+0x13/0x30
[  171.372329]  smp_call_function_interrupt+0x27/0x40
[  171.372329]  call_function_interrupt+0x9d/0xb0
[  171.372330] RIP: 0010:memcpy_orig+0x19/0x110
[  171.372330] RSP: 0000:ffffc90013b57d98 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff03
[  171.372331] RAX: ffffc90003f1c400 RBX: ffff880063b54a88 RCX: 0000000000000004
[  171.372331] RDX: 00000000000010e0 RSI: ffffc900037d3900 RDI: ffffc90003f1c6e0
[  171.372332] RBP: ffffc90013b57e08 R08: 0000000000000000 R09: 0000000000000000
[  171.372332] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000001400
[  171.372333] R13: 000000000000011e R14: ffff880063b55168 R15: ffffc900037d3620
[  171.372333]  </IRQ>
[  171.372333]  ? vmw_fb_dirty_flush+0x1ef/0x2b0 [vmwgfx]
[  171.372334]  process_one_work+0x22b/0x760
[  171.372334]  ? process_one_work+0x194/0x760
[  171.372335]  worker_thread+0x137/0x4b0
[  171.372335]  kthread+0x10f/0x150
[  171.372335]  ? process_one_work+0x760/0x760
[  171.372336]  ? kthread_create_on_node+0x70/0x70
[  171.372336]  ? do_syscall_64+0x6c/0x200
[  171.372337]  ret_from_fork+0x31/0x40
[  171.372337] Code: 00 00 4d 89 cf 8b 45 98 8b 75 9c 31 d2 4c 8b 85 78 ff ff ff 83 c0 01 83 c6 01 83 f8 03 0f 44 c2 48 63 c8 48 83 c1 01 48 c1 e1 04 <4c> 01 c1 48 8b 39 48 39 f9 74 de 89 45 98 83 fe 03 89 f0 0f 44 
[  171.372357] NMI backtrace for cpu 1
[  171.372357] CPU: 1 PID: 47 Comm: khugepaged Tainted: G        W    L  4.10.0-rc7-next-20170207+ #500
[  171.372358] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[  171.372358] task: ffff88006bd22540 task.stack: ffffc90000890000
[  171.372359] RIP: 0010:delay_tsc+0x48/0x70
[  171.372359] RSP: 0018:ffffc90000893520 EFLAGS: 00000006
[  171.372360] RAX: 000000004e0ee6be RBX: ffff88007ffddd00 RCX: 000000774e0ee6a6
[  171.372360] RDX: 0000000000000077 RSI: 0000000000000001 RDI: 0000000000000001
[  171.372361] RBP: ffffc90000893520 R08: 0000000000000000 R09: 0000000000000000
[  171.372361] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000a6822110
[  171.372362] R13: 0000000083093e7f R14: 0000000000000001 R15: ffffea000137a020
[  171.372362] FS:  0000000000000000(0000) GS:ffff88006d400000(0000) knlGS:0000000000000000
[  171.372363] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  171.372363] CR2: 00007ff5aeb82000 CR3: 0000000068093000 CR4: 00000000001406e0
[  171.372363] Call Trace:
[  171.372364]  __delay+0xf/0x20
[  171.372364]  do_raw_spin_lock+0x86/0x130
[  171.372365]  _raw_spin_lock_irqsave+0x75/0x90
[  171.372365]  ? free_pcppages_bulk+0x37/0x910
[  171.372365]  free_pcppages_bulk+0x37/0x910
[  171.372366]  ? __kernel_map_pages+0x87/0x120
[  171.372366]  free_hot_cold_page+0x373/0x460
[  171.372367]  free_hot_cold_page_list+0x3c/0x1c0
[  171.372367]  shrink_page_list+0x4dd/0xd10
[  171.372368]  shrink_inactive_list+0x1c5/0x660
[  171.372368]  shrink_node_memcg+0x535/0x7f0
[  171.372368]  ? mem_cgroup_iter+0x14d/0x720
[  171.372369]  shrink_node+0xe1/0x310
[  171.372369]  do_try_to_free_pages+0xe1/0x300
[  171.372370]  try_to_free_pages+0x131/0x3f0
[  171.372370]  __alloc_pages_slowpath+0x479/0xe32
[  171.372370]  __alloc_pages_nodemask+0x382/0x3d0
[  171.372371]  khugepaged_alloc_page+0x6d/0xd0
[  171.372371]  collapse_huge_page+0x81/0x1240
[  171.372372]  ? sched_clock_cpu+0x11/0xc0
[  171.372372]  ? khugepaged_scan_mm_slot+0xc26/0x1000
[  171.372373]  khugepaged_scan_mm_slot+0xc49/0x1000
[  171.372373]  ? sched_clock_cpu+0x11/0xc0
[  171.372373]  ? finish_wait+0x75/0x90
[  171.372374]  khugepaged+0x327/0x5e0
[  171.372374]  ? remove_wait_queue+0x60/0x60
[  171.372375]  kthread+0x10f/0x150
[  171.372375]  ? khugepaged_scan_mm_slot+0x1000/0x1000
[  171.372375]  ? kthread_create_on_node+0x70/0x70
[  171.372376]  ret_from_fork+0x31/0x40
[  171.372376] Code: 89 d1 48 c1 e1 20 48 09 c1 eb 1b 65 ff 0d c9 0a c1 7e f3 90 65 ff 05 c0 0a c1 7e 65 8b 05 51 d7 c0 7e 39 c6 75 20 0f ae e8 0f 31 <48> c1 e2 20 48 09 c2 48 89 d0 48 29 c8 48 39 f8 72 ce 65 ff 0d 
----------

I also got soft lockup without any change using linux-next-20170202.
Something is wrong with calling multiple CPUs? List corruption?

----------
[   80.556598] ip6_tables: (C) 2000-2006 Netfilter Core Team
[   84.020374] IPv6: ADDRCONF(NETDEV_UP): eno16777728: link is not ready
[   84.024423] e1000: eno16777728 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[   84.030731] IPv6: ADDRCONF(NETDEV_CHANGE): eno16777728: link becomes ready
[   84.212613] Ebtables v2.0 registered
[   86.841246] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
[   90.594709] Netfilter messages via NETLINK v0.30.
[   90.756119] ip_set: protocol 6
[  161.309259] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [ip6tables-resto:4210]
[  162.329982] Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd ppdev glue_helper vmw_balloon pcspkr sg parport_pc parport i2c_piix4 shpchp vmw_vsock_vmci_transport vsock vmw_vmci ip_tables xfs libcrc32c sd_mod sr_mod cdrom ata_generic pata_acpi crc32c_intel serio_raw vmwgfx drm_kms_helper syscopyarea sysfillrect
[  162.486639]  sysimgblt fb_sys_fops ttm e1000 mptspi ahci scsi_transport_spi drm libahci ata_piix mptscsih i2c_core mptbase libata
[  162.486651] irq event stamp: 306010
[  162.486656] hardirqs last  enabled at (306009): [<ffffffff817e4970>] restore_regs_and_iret+0x0/0x1d
[  162.486658] hardirqs last disabled at (306010): [<ffffffff817e5d98>] apic_timer_interrupt+0x98/0xb0
[  162.486661] softirqs last  enabled at (306008): [<ffffffff817e9739>] __do_softirq+0x349/0x52d
[  162.486664] softirqs last disabled at (306001): [<ffffffff810a98c5>] irq_exit+0xf5/0x110
[  162.486666] CPU: 3 PID: 4210 Comm: ip6tables-resto Not tainted 4.10.0-rc6-next-20170202 #498
[  162.486667] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[  162.486668] task: ffff880062eb4a40 task.stack: ffffc900054e8000
[  162.486671] RIP: 0010:smp_call_function_many+0x25c/0x320
[  162.486672] RSP: 0018:ffffc900054ebc98 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff10
[  162.486674] RAX: 0000000000000000 RBX: ffff88006d9dd680 RCX: 0000000000000001
[  162.486675] RDX: 0000000000000001 RSI: ffff88006d3e3600 RDI: ffff88006c52ad68
[  162.486675] RBP: ffffc900054ebce0 R08: 0000000000000007 R09: 0000000000000000
[  162.486676] R10: 0000000000000001 R11: ffff880067ecd768 R12: 0000000000000000
[  162.486677] R13: ffffffff81080790 R14: ffffc900054ebd18 R15: 0000000000000080
[  162.486678] FS:  00007f37a4b86740(0000) GS:ffff88006d800000(0000) knlGS:0000000000000000
[  162.486679] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  162.486680] CR2: 00000000008ed017 CR3: 0000000053f70000 CR4: 00000000001406e0
[  162.486719] Call Trace:
[  162.486725]  ? x86_configure_nx+0x50/0x50
[  162.486727]  on_each_cpu+0x3b/0xa0
[  162.486730]  flush_tlb_kernel_range+0x79/0x80
[  162.486734]  remove_vm_area+0xb1/0xc0
[  162.486737]  __vunmap+0x2e/0x110
[  162.486739]  vfree+0x2e/0x70
[  162.486744]  do_ip6t_get_ctl+0x2de/0x370 [ip6_tables]
[  162.486751]  nf_getsockopt+0x49/0x70
[  162.486755]  ipv6_getsockopt+0xd3/0x130
[  162.486758]  rawv6_getsockopt+0x2c/0x70
[  162.486761]  sock_common_getsockopt+0x14/0x20
[  162.486763]  SyS_getsockopt+0x77/0xe0
[  162.486767]  do_syscall_64+0x6c/0x200
[  162.486770]  entry_SYSCALL64_slow_path+0x25/0x25
[  162.486771] RIP: 0033:0x7f37a3d9151a
[  162.486772] RSP: 002b:00007ffeecccd9e8 EFLAGS: 00000202 ORIG_RAX: 0000000000000037
[  162.486774] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f37a3d9151a
[  162.486774] RDX: 0000000000000041 RSI: 0000000000000029 RDI: 0000000000000003
[  162.486775] RBP: 00000000008e90c0 R08: 00007ffeecccda30 R09: feff7164736b6865
[  162.486776] R10: 00000000008e90c0 R11: 0000000000000202 R12: 00007ffeecccdf60
[  162.486777] R13: 00000000008e9010 R14: 00007ffeecccda40 R15: 0000000000000000
[  162.486783] Code: bb 39 2b 00 3b 05 a9 40 c7 00 41 89 c4 0f 8d 3f fe ff ff 48 63 d0 48 8b 33 48 03 34 d5 60 c4 ab 81 8b 56 18 83 e2 01 74 0a f3 90 <8b> 4e 18 83 e1 01 75 f6 83 f8 ff 48 8b 7b 08 74 2a 48 63 35 70 
----------

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ