[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170207153459.GV5065@dhcp22.suse.cz>
Date: Tue, 7 Feb 2017 16:34:59 +0100
From: Michal Hocko <mhocko@...nel.org>
To: Mel Gorman <mgorman@...hsingularity.net>
Cc: Vlastimil Babka <vbabka@...e.cz>,
Dmitry Vyukov <dvyukov@...gle.com>, Tejun Heo <tj@...nel.org>,
Christoph Lameter <cl@...ux.com>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
LKML <linux-kernel@...r.kernel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
syzkaller <syzkaller@...glegroups.com>,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Tue 07-02-17 15:19:11, Michal Hocko wrote:
> On Tue 07-02-17 13:58:46, Mel Gorman wrote:
> > On Tue, Feb 07, 2017 at 01:37:08PM +0100, Michal Hocko wrote:
> [...]
> > > Anyway, shouldn't be it sufficient to disable preemption
> > > on drain_local_pages_wq?
> >
> > That would be sufficient for a hot-removed CPU moving the drain request
> > to another CPU and avoiding any scheduling events.
> >
> > > The CPU hotplug callback will not preempt us
> > > and so we cannot work on the same cpus, right?
> > >
> >
> > I don't see a specific guarantee that it cannot be preempted and it
> > would depend on an the exact cpu hotplug implementation which is subject
> > to quite a lot of change.
>
> But we do not care about the whole cpu hotplug code. The only part we
> really do care about is the race inside drain_pages_zone and that will
> run in an atomic context on the specific CPU.
>
> You are absolutely right that using the mutex is safe as well but the
> hotplug path is already littered with locks and adding one more to the
> picture doesn't sound great to me. So I would really like to not use a
> lock if that is possible and safe (with a big fat comment of course).
And with the full changelog. I hope I haven't missed anything this time.
---
>From 8c6af3116520251cc4ec2213f0a4ed2544bb4365 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@...e.com>
Date: Tue, 7 Feb 2017 16:08:35 +0100
Subject: [PATCH] mm, page_alloc: do not depend on cpu hotplug locks inside the
allocator
Dmitry has reported the following lockdep splat
[<ffffffff81571db1>] lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753
[<ffffffff8436697e>] __mutex_lock_common kernel/locking/mutex.c:521 [inline]
[<ffffffff8436697e>] mutex_lock_nested+0x24e/0xff0 kernel/locking/mutex.c:621
[<ffffffff818f07ea>] pcpu_alloc+0xbda/0x1280 mm/percpu.c:896
[<ffffffff818f0ee4>] __alloc_percpu+0x24/0x30 mm/percpu.c:1075
[<ffffffff816543e3>] smpcfd_prepare_cpu+0x73/0xd0 kernel/smp.c:44
[<ffffffff814240b4>] cpuhp_invoke_callback+0x254/0x1480 kernel/cpu.c:136
[<ffffffff81425821>] cpuhp_up_callbacks+0x81/0x2a0 kernel/cpu.c:493
[<ffffffff81427bf3>] _cpu_up+0x1e3/0x2a0 kernel/cpu.c:1057
[<ffffffff81427d23>] do_cpu_up+0x73/0xa0 kernel/cpu.c:1087
[<ffffffff81427d68>] cpu_up+0x18/0x20 kernel/cpu.c:1095
[<ffffffff854ede84>] smp_init+0xe9/0xee kernel/smp.c:564
[<ffffffff85482f81>] kernel_init_freeable+0x439/0x690 init/main.c:1010
[<ffffffff84357083>] kernel_init+0x13/0x180 init/main.c:941
[<ffffffff84377baa>] ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433
cpu_hotplug_begin
cpu_hotplug.lock
pcpu_alloc
pcpu_alloc_mutex
[<ffffffff81423012>] get_online_cpus+0x62/0x90 kernel/cpu.c:248
[<ffffffff8185fcf8>] drain_all_pages+0xf8/0x710 mm/page_alloc.c:2385
[<ffffffff81865e5d>] __alloc_pages_direct_reclaim mm/page_alloc.c:3440 [inline]
[<ffffffff81865e5d>] __alloc_pages_slowpath+0x8fd/0x2370 mm/page_alloc.c:3778
[<ffffffff818681c5>] __alloc_pages_nodemask+0x8f5/0xc60 mm/page_alloc.c:3980
[<ffffffff818ed0c1>] __alloc_pages include/linux/gfp.h:426 [inline]
[<ffffffff818ed0c1>] __alloc_pages_node include/linux/gfp.h:439 [inline]
[<ffffffff818ed0c1>] alloc_pages_node include/linux/gfp.h:453 [inline]
[<ffffffff818ed0c1>] pcpu_alloc_pages mm/percpu-vm.c:93 [inline]
[<ffffffff818ed0c1>] pcpu_populate_chunk+0x1e1/0x900 mm/percpu-vm.c:282
[<ffffffff818f0a11>] pcpu_alloc+0xe01/0x1280 mm/percpu.c:998
[<ffffffff818f0eb7>] __alloc_percpu_gfp+0x27/0x30 mm/percpu.c:1062
[<ffffffff817d25b2>] bpf_array_alloc_percpu kernel/bpf/arraymap.c:34 [inline]
[<ffffffff817d25b2>] array_map_alloc+0x532/0x710 kernel/bpf/arraymap.c:99
[<ffffffff817ba034>] find_and_alloc_map kernel/bpf/syscall.c:34 [inline]
[<ffffffff817ba034>] map_create kernel/bpf/syscall.c:188 [inline]
[<ffffffff817ba034>] SYSC_bpf kernel/bpf/syscall.c:870 [inline]
[<ffffffff817ba034>] SyS_bpf+0xd64/0x2500 kernel/bpf/syscall.c:827
[<ffffffff84377941>] entry_SYSCALL_64_fastpath+0x1f/0xc2
pcpu_alloc
pcpu_alloc_mutex
drain_all_pages
get_online_cpus
cpu_hotplug.lock
[<ffffffff81427876>] cpu_hotplug_begin+0x206/0x2e0 kernel/cpu.c:304
[<ffffffff81427ada>] _cpu_up+0xca/0x2a0 kernel/cpu.c:1011
[<ffffffff81427d23>] do_cpu_up+0x73/0xa0 kernel/cpu.c:1087
[<ffffffff81427d68>] cpu_up+0x18/0x20 kernel/cpu.c:1095
[<ffffffff854ede84>] smp_init+0xe9/0xee kernel/smp.c:564
[<ffffffff85482f81>] kernel_init_freeable+0x439/0x690 init/main.c:1010
[<ffffffff84357083>] kernel_init+0x13/0x180 init/main.c:941
[<ffffffff84377baa>] ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433
cpu_hotplug_begin
cpu_hotplug.lock
Pulling cpu hotplug locks inside the page allocator is just too
dangerous. Let's remove the dependency by dropping get_online_cpus()
from drain_all_pages. This is not so simple though because now we do not
have a protection against cpu hotplug which means 2 things:
- the work item might be executed on a different cpu in worker
from unbound pool so it doesn't run on pinned on the cpu
- we have to make sure that we do not race with page_alloc_cpu_dead
calling drain_pages_zone
Disabling preemption in drain_local_pages_wq will solve the first
problem drain_local_pages will determine its local CPU from the WQ
context which will be stable after that point, page_alloc_cpu_dead
is pinned to the CPU already. The later condition is achieved
by disabling IRQs in drain_pages_zone.
Reported-by: Dmitry Vyukov <dvyukov@...gle.com>
Signed-off-by: Michal Hocko <mhocko@...e.com>
---
mm/page_alloc.c | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c3358d4f7932..b6411816787a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2343,7 +2343,16 @@ void drain_local_pages(struct zone *zone)
static void drain_local_pages_wq(struct work_struct *work)
{
+ /*
+ * drain_all_pages doesn't use proper cpu hotplug protection so
+ * we can race with cpu offline when the WQ can move this from
+ * a cpu pinned worker to an unbound one. We can operate on a different
+ * cpu which is allright but we also have to make sure to not move to
+ * a different one.
+ */
+ preempt_disable();
drain_local_pages(NULL);
+ preempt_enable();
}
/*
@@ -2379,12 +2388,6 @@ void drain_all_pages(struct zone *zone)
}
/*
- * As this can be called from reclaim context, do not reenter reclaim.
- * An allocation failure can be handled, it's simply slower
- */
- get_online_cpus();
-
- /*
* We don't care about racing with CPU hotplug event
* as offline notification will cause the notified
* cpu to drain that CPU pcps and on_each_cpu_mask
@@ -2423,7 +2426,6 @@ void drain_all_pages(struct zone *zone)
for_each_cpu(cpu, &cpus_with_pcps)
flush_work(per_cpu_ptr(&pcpu_drain, cpu));
- put_online_cpus();
mutex_unlock(&pcpu_drain_mutex);
}
--
2.11.0
--
Michal Hocko
SUSE Labs
Powered by blists - more mailing lists