[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6e138348-aa28-8660-d902-96efafe1dcb2@I-love.SAKURA.ne.jp>
Date: Tue, 29 Aug 2017 20:20:39 +0900
From: Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
To: Andrew Morton <akpm@...ux-foundation.org>,
Michal Hocko <mhocko@...nel.org>
Cc: Mel Gorman <mgorman@...e.de>, Tejun Heo <tj@...nel.org>,
linux-mm@...ck.org, LKML <linux-kernel@...r.kernel.org>,
Michal Hocko <mhocko@...e.com>
Subject: Re: [PATCH] mm, memory_hotplug: do not back off draining pcp free
pages from kworker context
On 2017/08/29 7:33, Andrew Morton wrote:
> On Mon, 28 Aug 2017 11:33:41 +0200 Michal Hocko <mhocko@...nel.org> wrote:
>
>> drain_all_pages backs off when called from a kworker context since
>> 0ccce3b924212 ("mm, page_alloc: drain per-cpu pages from workqueue
>> context") because the original IPI based pcp draining has been replaced
>> by a WQ based one and the check wanted to prevent from recursion and
>> inter workers dependencies. This has made some sense at the time
>> because the system WQ has been used and one worker holding the lock
>> could be blocked while waiting for new workers to emerge which can be a
>> problem under OOM conditions.
>>
>> Since then ce612879ddc7 ("mm: move pcp and lru-pcp draining into single
>> wq") has moved draining to a dedicated (mm_percpu_wq) WQ with a rescuer
>> so we shouldn't depend on any other WQ activity to make a forward
>> progress so calling drain_all_pages from a worker context is safe as
>> long as this doesn't happen from mm_percpu_wq itself which is not the
>> case because all workers are required to _not_ depend on any MM locks.
>>
>> Why is this a problem in the first place? ACPI driven memory hot-remove
>> (acpi_device_hotplug) is executed from the worker context. We end
>> up calling __offline_pages to free all the pages and that requires
>> both lru_add_drain_all_cpuslocked and drain_all_pages to do their job
>> otherwise we can have dangling pages on pcp lists and fail the offline
>> operation (__test_page_isolated_in_pageblock would see a page with 0
>> ref. count but without PageBuddy set).
>>
>> Fix the issue by removing the worker check in drain_all_pages.
>> lru_add_drain_all_cpuslocked doesn't have this restriction so it works
>> as expected.
>>
>> Fixes: 0ccce3b924212 ("mm, page_alloc: drain per-cpu pages from workqueue context")
>> Signed-off-by: Michal Hocko <mhocko@...e.com>
>
> No cc:stable?
>
Michal, are you sure that this patch does not cause deadlock?
As shown in "[PATCH] mm: Use WQ_HIGHPRI for mm_percpu_wq." thread, currently work
items on mm_percpu_wq seem to be blocked by other work items not on mm_percpu_wq.
Powered by blists - more mailing lists