[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.20.1705181158250.27641@east.gentwo.org>
Date: Thu, 18 May 2017 12:07:25 -0500 (CDT)
From: Christoph Lameter <cl@...ux.com>
To: Vlastimil Babka <vbabka@...e.cz>
cc: Michal Hocko <mhocko@...nel.org>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, cgroups@...r.kernel.org,
Li Zefan <lizefan@...wei.com>,
Mel Gorman <mgorman@...hsingularity.net>,
David Rientjes <rientjes@...gle.com>,
Hugh Dickins <hughd@...gle.com>,
Andrea Arcangeli <aarcange@...hat.com>,
Anshuman Khandual <khandual@...ux.vnet.ibm.com>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
linux-api@...r.kernel.org
Subject: Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race
with cpuset update
On Thu, 18 May 2017, Vlastimil Babka wrote:
> > The race is where? If you expand the node set during the move of the
> > application then you are safe in terms of the legacy apps that did not
> > include static bindings.
>
> No, that expand/shrink by itself doesn't work against parallel
Parallel? I think we are clear that ithis is inherently racy against the
app changing policies etc etc? There is a huge issue there already. The
app needs to be well behaved in some heretofore undefined way in order to
make moves clean.
> get_page_from_freelist going through a zonelist. Moving from node 0 to
> 1, with zonelist containing nodes 1 and 0 in that order:
>
> - mempolicy mask is 0
> - zonelist iteration checks node 1, it's not allowed, skip
There is an allocation from node 1? This is not allowed before the move.
So it should fail. Not skipping to another node.
> - mempolicy mask is 0,1 (expand)
> - mempolicy mask is 1 (shrink)
> - zonelist iteration checks node 0, it's not allowed, skip
> - OOM
Are you talking about a race here between zonelist scanning and the
moving? That has been there forever.
And frankly there are gazillions of these races. The best thing to do is
to get the cpuset moving logic out of the kernel and into user space.
Understand that this is a heuristic and maybe come up with a list of
restrictions that make an app safe. An safe app that can be moved must f.e
1. Not allocate new memory while its being moved
2. Not change memory policies after its initialization and while its being
moved.
3. Not save memory policy state in some variable (because the logic to
translate the memory policies for the new context cannot find it).
...
Again cpuset process migration is a huge mess that you do not want to
have in the kernel and AFAICT this is a corner case with difficult
semantics. Better have that in user space...
Powered by blists - more mailing lists