[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPcyv4jXmxjVaR=sGfqjy2QP_Yq4ALfTQb9_QMZ3tk0ntxfTFA@mail.gmail.com>
Date: Thu, 9 Mar 2017 10:10:31 -0800
From: Dan Williams <dan.j.williams@...el.com>
To: "Rafael J. Wysocki" <rjw@...ysocki.net>
Cc: Heiko Carstens <heiko.carstens@...ibm.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Linux MM <linux-mm@...ck.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
linux-s390 <linux-s390@...r.kernel.org>,
Michal Hocko <mhocko@...e.com>,
Vladimir Davydov <vdavydov.dev@...il.com>,
Ben Hutchings <ben@...adent.org.uk>,
Gerald Schaefer <gerald.schaefer@...ibm.com>,
Martin Schwidefsky <schwidefsky@...ibm.com>,
Sebastian Ott <sebott@...ux.vnet.ibm.com>
Subject: Re: [PATCH 1/2] mm: add private lock to serialize memory hotplug operations
On Thu, Mar 9, 2017 at 5:39 AM, Rafael J. Wysocki <rjw@...ysocki.net> wrote:
> On Thursday, March 09, 2017 02:06:15 PM Heiko Carstens wrote:
>> Commit bfc8c90139eb ("mem-hotplug: implement get/put_online_mems")
>> introduced new functions get/put_online_mems() and
>> mem_hotplug_begin/end() in order to allow similar semantics for memory
>> hotplug like for cpu hotplug.
>>
>> The corresponding functions for cpu hotplug are get/put_online_cpus()
>> and cpu_hotplug_begin/done() for cpu hotplug.
>>
>> The commit however missed to introduce functions that would serialize
>> memory hotplug operations like they are done for cpu hotplug with
>> cpu_maps_update_begin/done().
>>
>> This basically leaves mem_hotplug.active_writer unprotected and allows
>> concurrent writers to modify it, which may lead to problems as
>> outlined by commit f931ab479dd2 ("mm: fix devm_memremap_pages crash,
>> use mem_hotplug_{begin, done}").
>>
>> That commit was extended again with commit b5d24fda9c3d ("mm,
>> devm_memremap_pages: hold device_hotplug lock over mem_hotplug_{begin,
>> done}") which serializes memory hotplug operations for some call
>> sites by using the device_hotplug lock.
>>
>> In addition with commit 3fc21924100b ("mm: validate device_hotplug is
>> held for memory hotplug") a sanity check was added to
>> mem_hotplug_begin() to verify that the device_hotplug lock is held.
>
> Admittedly, I haven't looked at all of the code paths involved in detail yet,
> but there's one concern regarding lock/unlock_device_hotplug().
>
> The actual main purpose of it is to ensure safe removal of devices in cases
> when they cannot be removed separately, like when a whole CPU package
> (including possibly an entire NUMA node with memory and all) is removed.
>
> One of the code paths doing that is acpi_scan_hot_remove() which first
> tries to offline devices slated for removal and then finally removes them.
>
> The reason why this needs to be done in two stages is because the offlining
> can fail, in which case we will fail the entire operation, while the final
> removal step is, well, final (meaning that the devices are gone after it no
> matter what).
>
> This is done under device_hotplug_lock, so that the devices that were taken
> offline in stage 1 cannot be brought back online before stage 2 is carried
> out entirely, which surely would be bad if it happened.
>
> Now, I'm not sure if removing lock/unlock_device_hotplug() from the code in
> question actually affects this mechanism, but this in case it does, it is one
> thing to double check before going ahead with this patch.
>
I *think* we're ok in this case because unplugging the CPU package
that contains a persistent memory device will trigger
devm_memremap_pages() to call arch_remove_memory(). Removing a pmem
device can't fail. It may be held off while pages are pinned for DMA
memory, but it will eventually complete.
Powered by blists - more mailing lists