[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9fad7246-c634-18bb-78f9-b95376c009da@suse.cz>
Date: Wed, 13 Sep 2017 13:41:20 +0200
From: Vlastimil Babka <vbabka@...e.cz>
To: Michal Hocko <mhocko@...nel.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
Reza Arbab <arbab@...ux.vnet.ibm.com>,
Yasuaki Ishimatsu <yasu.isimatu@...il.com>,
qiuxishi@...wei.com, Igor Mammedov <imammedo@...hat.com>,
Vitaly Kuznetsov <vkuznets@...hat.com>, linux-mm@...ck.org,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/2] mm, memory_hotplug: do not fail offlining too early
On 09/11/2017 10:17 AM, Michal Hocko wrote:
> On Fri 08-09-17 19:26:06, Vlastimil Babka wrote:
>> On 09/04/2017 10:21 AM, Michal Hocko wrote:
>>> From: Michal Hocko <mhocko@...e.com>
>>>
>>> Fix this by removing the max retry count and only rely on the timeout
>>> resp. interruption by a signal from the userspace. Also retry rather
>>> than fail when check_pages_isolated sees some !free pages because those
>>> could be a result of the race as well.
>>>
>>> Signed-off-by: Michal Hocko <mhocko@...e.com>
>>
>> Even within a movable node where has_unmovable_pages() is a non-issue, you could
>> have pinned movable pages where the pinning is not temporary.
>
> Who would pin those pages? Such a page would be unreclaimable as well
> and thus a memory leak and I would argue it would be a bug.
I don't know who exactly, but generally it's a problem for CMA and a
reason why there was some effort from PeterZ to introduce an API for
long-term pinning.
>> So after this
>> patch, this will really keep retrying forever. I'm not saying it's wrong, just
>> pointing it out, since the changelog seems to assume there would be only
>> temporary failures possible and thus unbound retries are always correct.
>> The obvious problem if we wanted to avoid this, is how to recognize
>> non-temporary failures...
>
> Yes, we should be able to distinguish the two and hopefully we can teach
> the migration code to distinguish between EBUSY (likely permanent) and
> EGAIN (temporal) failure. This sound like something we should aim for
> longterm I guess. Anyway as I've said in other email. If somebody really
> wants to have a guaratee of a bounded retry then it is trivial to set up
> an alarm and send a signal itself to bail out.
Sure, I would just be careful about not breaking existing userspace
(udev?) when offline triggered via ACPI from some management interface
(or whatever the exact mechanism is).
> Do you think that the changelog should be more clear about this?
It certainly wouldn't hurt :)
Powered by blists - more mailing lists