lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 18 May 2017 12:07:25 -0500 (CDT)
From:   Christoph Lameter <cl@...ux.com>
To:     Vlastimil Babka <vbabka@...e.cz>
cc:     Michal Hocko <mhocko@...nel.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, cgroups@...r.kernel.org,
        Li Zefan <lizefan@...wei.com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        David Rientjes <rientjes@...gle.com>,
        Hugh Dickins <hughd@...gle.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Anshuman Khandual <khandual@...ux.vnet.ibm.com>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        linux-api@...r.kernel.org
Subject: Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race
 with cpuset update

On Thu, 18 May 2017, Vlastimil Babka wrote:

> > The race is where? If you expand the node set during the move of the
> > application then you are safe in terms of the legacy apps that did not
> > include static bindings.
>
> No, that expand/shrink by itself doesn't work against parallel

Parallel? I think we are clear that ithis is inherently racy against the
app changing policies etc etc? There is a huge issue there already. The
app needs to be well behaved in some heretofore undefined way in order to
make moves clean.

> get_page_from_freelist going through a zonelist. Moving from node 0 to
> 1, with zonelist containing nodes 1 and 0 in that order:
>
> - mempolicy mask is 0
> - zonelist iteration checks node 1, it's not allowed, skip

There is an allocation from node 1? This is not allowed before the move.
So it should fail. Not skipping to another node.

> - mempolicy mask is 0,1 (expand)
> - mempolicy mask is 1 (shrink)
> - zonelist iteration checks node 0, it's not allowed, skip
> - OOM

Are you talking about a race here between zonelist scanning and the
moving? That has been there forever.

And frankly there are gazillions of these races. The best thing to do is
to get the cpuset moving logic out of the kernel and into user space.

Understand that this is a heuristic and maybe come up with a list of
restrictions that make an app safe. An safe app that can be moved must f.e

1. Not allocate new memory while its being moved
2. Not change memory policies after its initialization and while its being
moved.
3. Not save memory policy state in some variable (because the logic to
translate the memory policies for the new context cannot find it).

...

Again cpuset process migration  is a huge mess that you do not want to
have in the kernel and AFAICT this is a corner case with difficult
semantics. Better have that in user space...


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ