[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bb71b68e-dc1b-a4d3-d842-b311535b92a8@redhat.com>
Date: Thu, 28 Feb 2019 08:38:34 +0100
From: David Hildenbrand <david@...hat.com>
To: Mike Kravetz <mike.kravetz@...cle.com>,
Oscar Salvador <osalvador@...e.de>, linux-mm@...ck.org
Cc: linux-kernel@...r.kernel.org, mhocko@...e.com
Subject: Re: [RFC PATCH] mm,memory_hotplug: Unlock 1GB-hugetlb on x86_64
On 27.02.19 23:00, Mike Kravetz wrote:
> On 2/27/19 1:51 PM, Oscar Salvador wrote:
>> On Thu, Feb 21, 2019 at 10:42:12AM +0100, Oscar Salvador wrote:
>>> [1] https://lore.kernel.org/patchwork/patch/998796/
>>>
>>> Signed-off-by: Oscar Salvador <osalvador@...e.de>
>>
>> Any further comments on this?
>> I do have a "concern" I would like to sort out before dropping the RFC:
>>
>> It is the fact that unless we have spare gigantic pages in other notes, the
>> offlining operation will loop forever (until the customer cancels the operation).
>> While I do not really like that, I do think that memory offlining should be done
>> with some sanity, and the administrator should know in advance if the system is going
>> to be able to keep up with the memory pressure, aka: make sure we got what we need in
>> order to make the offlining operation to succeed.
>> That translates to be sure that we have spare gigantic pages and other nodes
>> can take them.
>>
>> Given said that, another thing I thought about is that we could check if we have
>> spare gigantic pages at has_unmovable_pages() time.
>> Something like checking "h->free_huge_pages - h->resv_huge_pages > 0", and if it
>> turns out that we do not have gigantic pages anywhere, just return as we have
>> non-movable pages.
>
> Of course, that check would be racy. Even if there is an available gigantic
> page at has_unmovable_pages() time there is no guarantee it will be there when
> we want to allocate/use it. But, you would at least catch 'most' cases of
> looping forever.
>
>> But I would rather not convulate has_unmovable_pages() with such checks and "trust"
>> the administrator.
I think we have the exact same issue already with huge/ordinary pages if
we are low on memory. We could loop forever.
In the long run, we should properly detect such issues and abort instead
of looping forever I guess. But as we all know, error handling in the
whole offlining part is still far away from being perfect ...
--
Thanks,
David / dhildenb
Powered by blists - more mailing lists