[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c801da70-1aa5-666a-615e-852100d6145e@oracle.com>
Date: Thu, 4 Jul 2019 08:11:44 -0700
From: Mike Kravetz <mike.kravetz@...cle.com>
To: Michal Hocko <mhocko@...nel.org>
Cc: Mel Gorman <mgorman@...hsingularity.net>,
Mel Gorman <mgorman@...e.de>, Vlastimil Babka <vbabka@...e.cz>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
linux-kernel <linux-kernel@...r.kernel.org>,
Andrea Arcangeli <aarcange@...hat.com>,
Johannes Weiner <hannes@...xchg.org>
Subject: Re: [Question] Should direct reclaim time be bounded?
On 7/4/19 4:09 AM, Michal Hocko wrote:
> On Wed 03-07-19 16:54:35, Mike Kravetz wrote:
>> On 7/3/19 2:43 AM, Mel Gorman wrote:
>>> Indeed. I'm getting knocked offline shortly so I didn't give this the
>>> time it deserves but it appears that part of this problem is
>>> hugetlb-specific when one node is full and can enter into this continual
>>> loop due to __GFP_RETRY_MAYFAIL requiring both nr_reclaimed and
>>> nr_scanned to be zero.
>>
>> Yes, I am not aware of any other large order allocations consistently made
>> with __GFP_RETRY_MAYFAIL. But, I did not look too closely. Michal believes
>> that hugetlb pages allocations should use __GFP_RETRY_MAYFAIL.
>
> Yes. The argument is that this is controlable by an admin and failures
> should be prevented as much as possible. I didn't get to understand
> should_continue_reclaim part of the problem but I have a strong feeling
> that __GFP_RETRY_MAYFAIL handling at that layer is not correct. What
> happens if it is simply removed and we rely only on the retry mechanism
> from the page allocator instead? Does the success rate is reduced
> considerably?
It certainly will be reduced. I 'think' it will be hard to predict how
much it will be reduced as this will depend on the state of memory usage
and fragmentation at the time of the attempt.
I can try to measure this, but I will be a few days due to U.S. holiday.
--
Mike Kravetz
Powered by blists - more mailing lists