[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.10.1507141508590.16182@chino.kir.corp.google.com>
Date: Tue, 14 Jul 2015 15:35:15 -0700 (PDT)
From: David Rientjes <rientjes@...gle.com>
To: Marcin Ślusarz <marcin.slusarz@...il.com>
cc: Tejun Heo <tj@...nel.org>,
Paul McKenney <paulmck@...ux.vnet.ibm.com>,
LKML <linux-kernel@...r.kernel.org>, linux-mm@...ck.org
Subject: Re: cpu_hotplug vs oom_notify_list: possible circular locking
dependency detected
On Sun, 12 Jul 2015, Marcin Ślusarz wrote:
> [28954.363492] ======================================================
> [28954.363492] [ INFO: possible circular locking dependency detected ]
> [28954.363494] 4.1.2 #56 Not tainted
> [28954.363494] -------------------------------------------------------
> [28954.363495] pm-suspend/16647 is trying to acquire lock:
> [28954.363502] (s_active#22){++++.+}, at: [<ffffffff812ce269>] kernfs_remove_by_name_ns+0x49/0xb0
> [28954.363502]
> but task is already holding lock:
> [28954.363505] (cpu_hotplug.lock#2){+.+.+.}, at: [<ffffffff810b6042>] cpu_hotplug_begin+0x72/0xc0
> [28954.363506]
> which lock already depends on the new lock.
>
> [28954.363506]
> the existing dependency chain (in reverse order) is:
> [28954.363508]
> -> #4 (cpu_hotplug.lock#2){+.+.+.}:
> [28954.363511] [<ffffffff811103db>] lock_acquire+0xbb/0x290
> [28954.363514] [<ffffffff8179d194>] down_read+0x34/0x50
> [28954.363517] [<ffffffff810dc419>] __blocking_notifier_call_chain+0x39/0x70
> [28954.363518] [<ffffffff810dc466>] blocking_notifier_call_chain+0x16/0x20
> [28954.363521] [<ffffffff811ddaff>] __out_of_memory+0x3f/0x660
> [28954.363522] [<ffffffff811de2bb>] out_of_memory+0x5b/0x80
> [28954.363524] [<ffffffff811e463b>] __alloc_pages_nodemask+0xa7b/0xc20
> [28954.363527] [<ffffffff8122a683>] alloc_pages_current+0xf3/0x1a0
> [28954.363528] [<ffffffff811d9067>] __page_cache_alloc+0x117/0x140
> [28954.363530] [<ffffffff811dbebf>] filemap_fault+0x19f/0x3d0
> [28954.363531] [<ffffffff812088cd>] __do_fault+0x3d/0xc0
> [28954.363533] [<ffffffff8120c2e0>] handle_mm_fault+0xcd0/0x11a0
> [28954.363535] [<ffffffff810a371c>] __do_page_fault+0x18c/0x510
> [28954.363536] [<ffffffff810a3aac>] do_page_fault+0xc/0x10
> [28954.363538] [<ffffffff817a1a62>] page_fault+0x22/0x30
> [28954.363539]
> -> #3 ((oom_notify_list).rwsem){++++..}:
> [28954.363541] [<ffffffff811103db>] lock_acquire+0xbb/0x290
> [28954.363542] [<ffffffff8179d194>] down_read+0x34/0x50
> [28954.363544] [<ffffffff810dc419>] __blocking_notifier_call_chain+0x39/0x70
> [28954.363546] [<ffffffff810dc466>] blocking_notifier_call_chain+0x16/0x20
> [28954.363547] [<ffffffff811ddaff>] __out_of_memory+0x3f/0x660
> [28954.363549] [<ffffffff811de2bb>] out_of_memory+0x5b/0x80
> [28954.363550] [<ffffffff811e463b>] __alloc_pages_nodemask+0xa7b/0xc20
> [28954.363552] [<ffffffff8122a683>] alloc_pages_current+0xf3/0x1a0
> [28954.363553] [<ffffffff811d9067>] __page_cache_alloc+0x117/0x140
> [28954.363555] [<ffffffff811dbebf>] filemap_fault+0x19f/0x3d0
> [28954.363556] [<ffffffff812088cd>] __do_fault+0x3d/0xc0
> [28954.363557] [<ffffffff8120c2e0>] handle_mm_fault+0xcd0/0x11a0
> [28954.363558] [<ffffffff810a371c>] __do_page_fault+0x18c/0x510
> [28954.363559] [<ffffffff810a3aac>] do_page_fault+0xc/0x10
> [28954.363560] [<ffffffff817a1a62>] page_fault+0x22/0x30
> [28954.363562]
> -> #2 (oom_sem){++++..}:
> [28954.363563] [<ffffffff811103db>] lock_acquire+0xbb/0x290
> [28954.363565] [<ffffffff8179d194>] down_read+0x34/0x50
> [28954.363566] [<ffffffff811de294>] out_of_memory+0x34/0x80
> [28954.363568] [<ffffffff811e463b>] __alloc_pages_nodemask+0xa7b/0xc20
> [28954.363570] [<ffffffff8122a683>] alloc_pages_current+0xf3/0x1a0
> [28954.363571] [<ffffffff811d9067>] __page_cache_alloc+0x117/0x140
> [28954.363572] [<ffffffff811dbebf>] filemap_fault+0x19f/0x3d0
> [28954.363573] [<ffffffff812088cd>] __do_fault+0x3d/0xc0
> [28954.363574] [<ffffffff8120c2e0>] handle_mm_fault+0xcd0/0x11a0
> [28954.363575] [<ffffffff810a371c>] __do_page_fault+0x18c/0x510
> [28954.363576] [<ffffffff810a3aac>] do_page_fault+0xc/0x10
> [28954.363578] [<ffffffff817a1a62>] page_fault+0x22/0x30
> [28954.363579]
> -> #1 (&mm->mmap_sem){++++++}:
> [28954.363581] [<ffffffff811103db>] lock_acquire+0xbb/0x290
> [28954.363582] [<ffffffff812087cf>] might_fault+0x6f/0xa0
> [28954.363583] [<ffffffff812cf4ec>] kernfs_fop_write+0x7c/0x1a0
> [28954.363585] [<ffffffff81245388>] __vfs_write+0x28/0xf0
> [28954.363587] [<ffffffff81245ac9>] vfs_write+0xa9/0x1b0
> [28954.363588] [<ffffffff812468e9>] SyS_write+0x49/0xb0
> [28954.363589] [<ffffffff8179fd9b>] system_call_fastpath+0x16/0x73
> [28954.363591]
> -> #0 (s_active#22){++++.+}:
> [28954.363593] [<ffffffff8110f6f6>] __lock_acquire+0x1d86/0x2010
> [28954.363594] [<ffffffff811103db>] lock_acquire+0xbb/0x290
> [28954.363596] [<ffffffff812cd080>] __kernfs_remove+0x210/0x2f0
> [28954.363598] [<ffffffff812ce269>] kernfs_remove_by_name_ns+0x49/0xb0
> [28954.363600] [<ffffffff812d0a99>] sysfs_unmerge_group+0x49/0x60
> [28954.363602] [<ffffffff81537e89>] dpm_sysfs_remove+0x39/0x60
> [28954.363603] [<ffffffff8152b778>] device_del+0x58/0x280
> [28954.363605] [<ffffffff8152b9b6>] device_unregister+0x16/0x30
> [28954.363606] [<ffffffff81535b7d>] cpu_cache_sysfs_exit+0x5d/0xc0
> [28954.363608] [<ffffffff81536300>] cacheinfo_cpu_callback+0x40/0xa0
> [28954.363609] [<ffffffff810dc1f6>] notifier_call_chain+0x66/0x90
> [28954.363611] [<ffffffff810dc22e>] __raw_notifier_call_chain+0xe/0x10
> [28954.363612] [<ffffffff810b5ef3>] cpu_notify+0x23/0x50
> [28954.363613] [<ffffffff810b5fbe>] cpu_notify_nofail+0xe/0x20
> [28954.363615] [<ffffffff81792dd9>] _cpu_down+0x1d9/0x2e0
> [28954.363616] [<ffffffff810b65d8>] disable_nonboot_cpus+0xd8/0x530
> [28954.363617] [<ffffffff81117d62>] suspend_devices_and_enter+0x422/0xd60
> [28954.363619] [<ffffffff81118add>] pm_suspend+0x43d/0x530
> [28954.363620] [<ffffffff81116787>] state_store+0xa7/0xb0
> [28954.363622] [<ffffffff81419bdf>] kobj_attr_store+0xf/0x20
> [28954.363623] [<ffffffff812cfca9>] sysfs_kf_write+0x49/0x60
> [28954.363624] [<ffffffff812cf5b0>] kernfs_fop_write+0x140/0x1a0
> [28954.363626] [<ffffffff81245388>] __vfs_write+0x28/0xf0
> [28954.363627] [<ffffffff81245ac9>] vfs_write+0xa9/0x1b0
> [28954.363628] [<ffffffff812468e9>] SyS_write+0x49/0xb0
> [28954.363630] [<ffffffff8179fd9b>] system_call_fastpath+0x16/0x73
> [28954.363630]
> other info that might help us debug this:
>
> [28954.363632] Chain exists of:
> s_active#22 --> (oom_notify_list).rwsem --> cpu_hotplug.lock#2
>
> [28954.363633] Possible unsafe locking scenario:
>
> [28954.363633] CPU0 CPU1
> [28954.363633] ---- ----
> [28954.363635] lock(cpu_hotplug.lock#2);
> [28954.363635] lock((oom_notify_list).rwsem);
> [28954.363636] lock(cpu_hotplug.lock#2);
> [28954.363637] lock(s_active#22);
> [28954.363638]
> *** DEADLOCK ***
>
> [28954.363639] 9 locks held by pm-suspend/16647:
> [28954.363641] #0: (sb_writers#6){.+.+.+}, at: [<ffffffff81245b83>] vfs_write+0x163/0x1b0
> [28954.363643] #1: (&of->mutex){+.+.+.}, at: [<ffffffff812cf4d6>] kernfs_fop_write+0x66/0x1a0
> [28954.363646] #2: (s_active#186){.+.+.+}, at: [<ffffffff812cf4de>] kernfs_fop_write+0x6e/0x1a0
> [28954.363649] #3: (autosleep_lock){+.+...}, at: [<ffffffff8111fcb7>] pm_autosleep_lock+0x17/0x20
> [28954.363651] #4: (pm_mutex){+.+...}, at: [<ffffffff8111881c>] pm_suspend+0x17c/0x530
> [28954.363654] #5: (acpi_scan_lock){+.+.+.}, at: [<ffffffff81495716>] acpi_scan_lock_acquire+0x17/0x19
> [28954.363656] #6: (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff810b6529>] disable_nonboot_cpus+0x29/0x530
> [28954.363658] #7: (cpu_hotplug.lock){++++++}, at: [<ffffffff810b5fd5>] cpu_hotplug_begin+0x5/0xc0
> [28954.363661] #8: (cpu_hotplug.lock#2){+.+.+.}, at: [<ffffffff810b6042>] cpu_hotplug_begin+0x72/0xc0
> [28954.363661]
> stack backtrace:
> [28954.363663] CPU: 3 PID: 16647 Comm: pm-suspend Not tainted 4.1.2 #56
> [28954.363663] Hardware name: Dell Inc. Inspiron 7720/04M3YM, BIOS A07 08/16/2012
> [28954.363666] ffffffff826415a0 ffff88008952b838 ffffffff81796918 0000000080000001
> [28954.363667] ffffffff8263e150 ffff88008952b888 ffffffff8110bf8d ffff880040334cd0
> [28954.363669] ffff88008952b8f8 0000000000000008 ffff880040334ca8 0000000000000008
> [28954.363669] Call Trace:
> [28954.363671] [<ffffffff81796918>] dump_stack+0x4f/0x7b
> [28954.363673] [<ffffffff8110bf8d>] print_circular_bug+0x1cd/0x230
> [28954.363674] [<ffffffff8110f6f6>] __lock_acquire+0x1d86/0x2010
> [28954.363677] [<ffffffff811103db>] lock_acquire+0xbb/0x290
> [28954.363678] [<ffffffff812ce269>] ? kernfs_remove_by_name_ns+0x49/0xb0
> [28954.363680] [<ffffffff812cd080>] __kernfs_remove+0x210/0x2f0
> [28954.363682] [<ffffffff812ce269>] ? kernfs_remove_by_name_ns+0x49/0xb0
> [28954.363683] [<ffffffff812cc5c7>] ? kernfs_name_hash+0x17/0xa0
> [28954.363685] [<ffffffff812cd519>] ? kernfs_find_ns+0x89/0x160
> [28954.363687] [<ffffffff812ce269>] kernfs_remove_by_name_ns+0x49/0xb0
> [28954.363688] [<ffffffff812d0a99>] sysfs_unmerge_group+0x49/0x60
> [28954.363689] [<ffffffff81537e89>] dpm_sysfs_remove+0x39/0x60
> [28954.363691] [<ffffffff8152b778>] device_del+0x58/0x280
> [28954.363692] [<ffffffff8152b9b6>] device_unregister+0x16/0x30
> [28954.363693] [<ffffffff81535b7d>] cpu_cache_sysfs_exit+0x5d/0xc0
> [28954.363695] [<ffffffff81536300>] cacheinfo_cpu_callback+0x40/0xa0
> [28954.363696] [<ffffffff810dc1f6>] notifier_call_chain+0x66/0x90
> [28954.363698] [<ffffffff810dc22e>] __raw_notifier_call_chain+0xe/0x10
> [28954.363699] [<ffffffff810b5ef3>] cpu_notify+0x23/0x50
> [28954.363699] [<ffffffff810b5fbe>] cpu_notify_nofail+0xe/0x20
> [28954.363700] [<ffffffff81792dd9>] _cpu_down+0x1d9/0x2e0
> [28954.363702] [<ffffffff8110aa48>] ? __lock_is_held+0x58/0x80
> [28954.363703] [<ffffffff810b65d8>] disable_nonboot_cpus+0xd8/0x530
> [28954.363704] [<ffffffff81117d62>] suspend_devices_and_enter+0x422/0xd60
> [28954.363705] [<ffffffff81795714>] ? printk+0x46/0x48
> [28954.363707] [<ffffffff81118add>] pm_suspend+0x43d/0x530
> [28954.363708] [<ffffffff81116787>] state_store+0xa7/0xb0
> [28954.363710] [<ffffffff81419bdf>] kobj_attr_store+0xf/0x20
> [28954.363711] [<ffffffff812cfca9>] sysfs_kf_write+0x49/0x60
> [28954.363712] [<ffffffff812cf5b0>] kernfs_fop_write+0x140/0x1a0
> [28954.363713] [<ffffffff81245388>] __vfs_write+0x28/0xf0
> [28954.363714] [<ffffffff81245b83>] ? vfs_write+0x163/0x1b0
> [28954.363716] [<ffffffff813cbdb8>] ? apparmor_file_permission+0x18/0x20
> [28954.363719] [<ffffffff813bdca3>] ? security_file_permission+0x23/0xa0
> [28954.363720] [<ffffffff81245ac9>] vfs_write+0xa9/0x1b0
> [28954.363721] [<ffffffff812468e9>] SyS_write+0x49/0xb0
> [28954.363723] [<ffffffff8179fd9b>] system_call_fastpath+0x16/0x73
>
Thanks for the report.
We have pm-suspend/16647 doing cpu_hotplug_begin() in _cpu_down(), which
iterates the cpu down notifier list, one of which is
cacheinfo_cpu_callback(), which does cpu_cache_sysfs_exit() and that does
__kernfs_remove() which requires s_active. That chain makes sense.
Meanwhile, an oom notifier is taking the same mutex as
cpu_hotplug_begin(), presumably get_online_cpus(). I'm wondering if
that's the rcu oom notifier.
Adding Paul and Tejun.
Powered by blists - more mailing lists