linux-kernel - Re: mm: deadlock between get_online_cpus/pcpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170207104249.gpephtef2ajoqw62@techsingularity.net>
Date:   Tue, 7 Feb 2017 10:42:49 +0000
From:   Mel Gorman <mgorman@...hsingularity.net>
To:     Vlastimil Babka <vbabka@...e.cz>
Cc:     Michal Hocko <mhocko@...nel.org>,
        Dmitry Vyukov <dvyukov@...gle.com>, Tejun Heo <tj@...nel.org>,
        Christoph Lameter <cl@...ux.com>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        syzkaller <syzkaller@...glegroups.com>,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: mm: deadlock between get_online_cpus/pcpu_alloc

On Tue, Feb 07, 2017 at 10:23:31AM +0100, Vlastimil Babka wrote:
> > cpu offlining. I have to check the code but my impression was that WQ
> > code will ignore the cpu requested by the work item when the cpu is
> > going offline. If the offline happens while the worker function already
> > executes then it has to wait as we run with preemption disabled so we
> > should be safe here. Or am I missing something obvious?
> 
> Tejun suggested an alternative solution to avoiding get_online_cpus() in
> this thread:
> https://lkml.kernel.org/r/<20170123170329.GA7820@....duckdns.org>

But it would look like the following as it could be serialised against
pcpu_drain_mutex as the cpu hotplug teardown callback is allowed to sleep.

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3b93879990fd..8cd8b1bbe00c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2319,9 +2319,17 @@ static void drain_pages(unsigned int cpu)
 {
 	struct zone *zone;
 
+	/*
+	 * A per-cpu drain via a workqueue from drain_all_pages can be
+	 * rescheduled onto an unrelated CPU. That allows the hotplug
+	 * operation and the drain to potentially race on the same
+	 * CPU. Serialise hotplug versus drain using pcpu_drain_mutex
+	 */
+	mutex_lock(&pcpu_drain_mutex);
 	for_each_populated_zone(zone) {
 		drain_pages_zone(cpu, zone);
 	}
+	mutex_unlock(&pcpu_drain_mutex);
 }
 
 /*
@@ -2377,13 +2385,10 @@ void drain_all_pages(struct zone *zone)
 		mutex_lock(&pcpu_drain_mutex);
 	}
 
-	get_online_cpus();
-
 	/*
-	 * We don't care about racing with CPU hotplug event
-	 * as offline notification will cause the notified
-	 * cpu to drain that CPU pcps and on_each_cpu_mask
-	 * disables preemption as part of its processing
+	 * We don't care about racing with CPU hotplug event as offline
+	 * notification will cause the notified cpu to drain that CPU pcps
+	 * and it is serialised against here via pcpu_drain_mutex.
 	 */
 	for_each_online_cpu(cpu) {
 		struct per_cpu_pageset *pcp;
@@ -2418,7 +2423,6 @@ void drain_all_pages(struct zone *zone)
 	for_each_cpu(cpu, &cpus_with_pcps)
 		flush_work(per_cpu_ptr(&pcpu_drain, cpu));
 
-	put_online_cpus();
 	mutex_unlock(&pcpu_drain_mutex);
 }
 

-- 
Mel Gorman
SUSE Labs