linux-kernel - Re: possible circular locking dependency mmap_sem/cpu_hotplug_lock.rw

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170830154315.sa57wasw64rvnuhe@dhcp22.suse.cz>
Date:   Wed, 30 Aug 2017 17:43:15 +0200
From:   Michal Hocko <mhocko@...nel.org>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     Artem Savkov <asavkov@...hat.com>,
        "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Johannes Weiner <hannes@...xchg.org>, linux-mm@...ck.org
Subject: Re: possible circular locking dependency
 mmap_sem/cpu_hotplug_lock.rw_sem

The previous patch is insufficient. drain_all_stock can still race with
the memory offline callback and the underlying memcg disappear. So we
need to be more careful and pin the css on the memcg. This patch
instead...
---
>From 70a5acf9bbe76d183e81a1a6b57dd5b9edc677c6 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@...e.com>
Date: Wed, 30 Aug 2017 16:09:01 +0200
Subject: [PATCH] mm, memcg: remove hotplug locking from try_charge

The following lockde splat has been noticed during LTP testing

[21002.630252] ======================================================
[21002.637148] WARNING: possible circular locking dependency detected
[21002.644045] 4.13.0-rc3-next-20170807 #12 Not tainted
[21002.649583] ------------------------------------------------------
[21002.656492] a.out/4771 is trying to acquire lock:
[21002.661742]  (cpu_hotplug_lock.rw_sem){++++++}, at: [<ffffffff812b4668>] drain_all_stock.part.35+0x18/0x140
[21002.672629]
[21002.672629] but task is already holding lock:
[21002.679137]  (&mm->mmap_sem){++++++}, at: [<ffffffff8106eb35>] __do_page_fault+0x175/0x530
[21002.688371]
[21002.688371] which lock already depends on the new lock.
[21002.688371]
[21002.697505]
[21002.697505] the existing dependency chain (in reverse order) is:
[21002.705856]
[21002.705856] -> #3 (&mm->mmap_sem){++++++}:
[21002.712080]        lock_acquire+0xc9/0x230
[21002.716661]        __might_fault+0x70/0xa0
[21002.721241]        _copy_to_user+0x23/0x70
[21002.725814]        filldir+0xa7/0x110
[21002.729988]        xfs_dir2_sf_getdents.isra.10+0x20c/0x2c0 [xfs]
[21002.736840]        xfs_readdir+0x1fa/0x2c0 [xfs]
[21002.742042]        xfs_file_readdir+0x30/0x40 [xfs]
[21002.747485]        iterate_dir+0x17a/0x1a0
[21002.752057]        SyS_getdents+0xb0/0x160
[21002.756638]        entry_SYSCALL_64_fastpath+0x1f/0xbe
[21002.762371]
[21002.762371] -> #2 (&type->i_mutex_dir_key#3){++++++}:
[21002.769661]        lock_acquire+0xc9/0x230
[21002.774239]        down_read+0x51/0xb0
[21002.778429]        lookup_slow+0xde/0x210
[21002.782903]        walk_component+0x160/0x250
[21002.787765]        link_path_walk+0x1a6/0x610
[21002.792625]        path_openat+0xe4/0xd50
[21002.797100]        do_filp_open+0x91/0x100
[21002.801673]        file_open_name+0xf5/0x130
[21002.806429]        filp_open+0x33/0x50
[21002.810620]        kernel_read_file_from_path+0x39/0x80
[21002.816459]        _request_firmware+0x39f/0x880
[21002.821610]        request_firmware_direct+0x37/0x50
[21002.827151]        request_microcode_fw+0x64/0xe0
[21002.832401]        reload_store+0xf7/0x180
[21002.836974]        dev_attr_store+0x18/0x30
[21002.841641]        sysfs_kf_write+0x44/0x60
[21002.846318]        kernfs_fop_write+0x113/0x1a0
[21002.851374]        __vfs_write+0x37/0x170
[21002.855849]        vfs_write+0xc7/0x1c0
[21002.860128]        SyS_write+0x58/0xc0
[21002.864313]        do_syscall_64+0x6c/0x1f0
[21002.868973]        return_from_SYSCALL_64+0x0/0x7a
[21002.874317]
[21002.874317] -> #1 (microcode_mutex){+.+.+.}:
[21002.880748]        lock_acquire+0xc9/0x230
[21002.885322]        __mutex_lock+0x88/0x960
[21002.889894]        mutex_lock_nested+0x1b/0x20
[21002.894854]        microcode_init+0xbb/0x208
[21002.899617]        do_one_initcall+0x51/0x1a9
[21002.904481]        kernel_init_freeable+0x208/0x2a7
[21002.909922]        kernel_init+0xe/0x104
[21002.914298]        ret_from_fork+0x2a/0x40
[21002.918867]
[21002.918867] -> #0 (cpu_hotplug_lock.rw_sem){++++++}:
[21002.926058]        __lock_acquire+0x153c/0x1550
[21002.931112]        lock_acquire+0xc9/0x230
[21002.935688]        cpus_read_lock+0x4b/0x90
[21002.940353]        drain_all_stock.part.35+0x18/0x140
[21002.945987]        try_charge+0x3ab/0x6e0
[21002.950460]        mem_cgroup_try_charge+0x7f/0x2c0
[21002.955902]        shmem_getpage_gfp+0x25f/0x1050
[21002.961149]        shmem_fault+0x96/0x200
[21002.965621]        __do_fault+0x1e/0xa0
[21002.969905]        __handle_mm_fault+0x9c3/0xe00
[21002.975056]        handle_mm_fault+0x16e/0x380
[21002.980013]        __do_page_fault+0x24a/0x530
[21002.984968]        do_page_fault+0x30/0x80
[21002.989537]        page_fault+0x28/0x30
[21002.993812]
[21002.993812] other info that might help us debug this:
[21002.993812]
[21003.002744] Chain exists of:
[21003.002744]   cpu_hotplug_lock.rw_sem --> &type->i_mutex_dir_key#3 --> &mm->mmap_sem
[21003.002744]
[21003.016238]  Possible unsafe locking scenario:
[21003.016238]
[21003.022843]        CPU0                    CPU1
[21003.027896]        ----                    ----
[21003.032948]   lock(&mm->mmap_sem);
[21003.036741]                                lock(&type->i_mutex_dir_key#3);
[21003.044419]                                lock(&mm->mmap_sem);
[21003.051025]   lock(cpu_hotplug_lock.rw_sem);
[21003.055788]
[21003.055788]  *** DEADLOCK ***
[21003.055788]
[21003.062393] 2 locks held by a.out/4771:
[21003.066675]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff8106eb35>] __do_page_fault+0x175/0x530
[21003.076391]  #1:  (percpu_charge_mutex){+.+...}, at: [<ffffffff812b4c97>] try_charge+0x397/0x6e0

The problem is very similar to the one fixed by a459eeb7b852 ("mm,
page_alloc: do not depend on cpu hotplug locks inside the allocator").
We are taking hotplug locks while we can be sitting on top of basically
arbitrary locks. This just calls for problems.

We can get rid of {get,put}_online_cpus, fortunately. We do not have to
be worried about races with memory hotplug because drain_local_stock,
which is called from both the WQ draining and the memory hotplug
contexts, is always operating on the local cpu stock with IRQs disabled.

The only thing to be careful about is that the target memcg doesn't
vanish while we are still in drain_all_stock so take a reference on
it.

Signed-off-by: Michal Hocko <mhocko@...e.com>
---
 mm/memcontrol.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b9cf3cf4a3d0..5c70f47abb3d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1765,6 +1765,10 @@ static void drain_local_stock(struct work_struct *dummy)
 	struct memcg_stock_pcp *stock;
 	unsigned long flags;
 
+	/*
+	 * The only protection from memory hotplug vs. drain_stock races is
+	 * that we always operate on local CPU stock here with IRQ disabled
+	 */
 	local_irq_save(flags);
 
 	stock = this_cpu_ptr(&memcg_stock);
@@ -1807,26 +1811,27 @@ static void drain_all_stock(struct mem_cgroup *root_memcg)
 	if (!mutex_trylock(&percpu_charge_mutex))
 		return;
 	/* Notify other cpus that system-wide "drain" is running */
-	get_online_cpus();
 	curcpu = get_cpu();
 	for_each_online_cpu(cpu) {
 		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
 		struct mem_cgroup *memcg;
 
 		memcg = stock->cached;
-		if (!memcg || !stock->nr_pages)
+		if (!memcg || !stock->nr_pages || !css_tryget(&memcg->css))
 			continue;
-		if (!mem_cgroup_is_descendant(memcg, root_memcg))
+		if (!mem_cgroup_is_descendant(memcg, root_memcg)) {
+			css_put(&memcg->css);
 			continue;
+		}
 		if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) {
 			if (cpu == curcpu)
 				drain_local_stock(&stock->work);
 			else
 				schedule_work_on(cpu, &stock->work);
 		}
+		css_put(&memcg->css);
 	}
 	put_cpu();
-	put_online_cpus();
 	mutex_unlock(&percpu_charge_mutex);
 }
 
-- 
2.13.2

-- 
Michal Hocko
SUSE Labs