lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180911152725.GA28828@tower.DHCP.thefacebook.com>
Date:   Tue, 11 Sep 2018 08:27:30 -0700
From:   Roman Gushchin <guro@...com>
To:     Michal Hocko <mhocko@...nel.org>
CC:     <linux-mm@...ck.org>, <linux-kernel@...r.kernel.org>,
        <kernel-team@...com>, Johannes Weiner <hannes@...xchg.org>,
        Vladimir Davydov <vdavydov.dev@...il.com>
Subject: Re: [PATCH RFC] mm: don't raise MEMCG_OOM event due to failed
 high-order allocation

On Tue, Sep 11, 2018 at 02:11:41PM +0200, Michal Hocko wrote:
> On Mon 10-09-18 14:56:22, Roman Gushchin wrote:
> > The memcg OOM killer is never invoked due to a failed high-order
> > allocation, however the MEMCG_OOM event can be easily raised.
> > 
> > Under some memory pressure it can happen easily because of a
> > concurrent allocation. Let's look at try_charge(). Even if we were
> > able to reclaim enough memory, this check can fail due to a race
> > with another allocation:
> > 
> >     if (mem_cgroup_margin(mem_over_limit) >= nr_pages)
> >         goto retry;
> > 
> > For regular pages the following condition will save us from triggering
> > the OOM:
> > 
> >    if (nr_reclaimed && nr_pages <= (1 << PAGE_ALLOC_COSTLY_ORDER))
> >        goto retry;
> > 
> > But for high-order allocation this condition will intentionally fail.
> > The reason behind is that we'll likely fall to regular pages anyway,
> > so it's ok and even preferred to return ENOMEM.
> > 
> > In this case the idea of raising the MEMCG_OOM event looks dubious.
> 
> Why is this a problem though? IIRC this event was deliberately placed
> outside of the oom path because we wanted to count allocation failures
> and this is also documented that way
> 
>           oom
>                 The number of time the cgroup's memory usage was
>                 reached the limit and allocation was about to fail.
> 
>                 Depending on context result could be invocation of OOM
>                 killer and retrying allocation or failing a
> 
> One could argue that we do not apply the same logic to GFP_NOWAIT
> requests but in general I would like to see a good reason to change
> the behavior and if it is really the right thing to do then we need to
> update the documentation as well.

Right, the current behavior matches the documentation, because the description
of the event is broad enough. My point is that the current behavior is not
useful in my corner case.

Let me explain my case in details: I've got a report about sporadic memcg oom
kills on some hosts with plenty of pagecache and low memory pressure.
You'll probably agree, that raising OOM signal in this case looks strange.

It's natural for cgroup memory usage to be around memory.max border, and
I've explained in the commit message how an attempt to charge a high-order
allocation can fail in this case, even if there no real memory pressure
in the cgroup.

Thanks!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ