[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8889d67a-adab-91e1-c320-d8bd88d7e1e0@suse.cz>
Date: Thu, 18 May 2017 12:03:50 +0200
From: Vlastimil Babka <vbabka@...e.cz>
To: Christoph Lameter <cl@...ux.com>, Michal Hocko <mhocko@...nel.org>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
cgroups@...r.kernel.org, Li Zefan <lizefan@...wei.com>,
Mel Gorman <mgorman@...hsingularity.net>,
David Rientjes <rientjes@...gle.com>,
Hugh Dickins <hughd@...gle.com>,
Andrea Arcangeli <aarcange@...hat.com>,
Anshuman Khandual <khandual@...ux.vnet.ibm.com>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
linux-api@...r.kernel.org
Subject: Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with
cpuset update
On 05/17/2017 04:48 PM, Christoph Lameter wrote:
> On Wed, 17 May 2017, Michal Hocko wrote:
>
>>>> So how are you going to distinguish VM_FAULT_OOM from an empty mempolicy
>>>> case in a raceless way?
>>>
>>> You dont have to do that if you do not create an empty mempolicy in the
>>> first place. The current kernel code avoids that by first allowing access
>>> to the new set of nodes and removing the old ones from the set when done.
>>
>> which is racy and as Vlastimil pointed out. If we simply fail such an
>> allocation the failure will go up the call chain until we hit the OOM
>> killer due to VM_FAULT_OOM. How would you want to handle that?
>
> The race is where? If you expand the node set during the move of the
> application then you are safe in terms of the legacy apps that did not
> include static bindings.
No, that expand/shrink by itself doesn't work against parallel
get_page_from_freelist going through a zonelist. Moving from node 0 to
1, with zonelist containing nodes 1 and 0 in that order:
- mempolicy mask is 0
- zonelist iteration checks node 1, it's not allowed, skip
- mempolicy mask is 0,1 (expand)
- mempolicy mask is 1 (shrink)
- zonelist iteration checks node 0, it's not allowed, skip
- OOM
Powered by blists - more mailing lists