linux-kernel - Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of RAM that should be free

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20161123063410.GB2864@dhcp22.suse.cz>
Date:   Wed, 23 Nov 2016 07:34:10 +0100
From:   Michal Hocko <mhocko@...nel.org>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Vlastimil Babka <vbabka@...e.cz>, Marc MERLIN <marc@...lins.org>,
        linux-mm <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Joonsoo Kim <iamjoonsoo.kim@....com>,
        Tejun Heo <tj@...nel.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Subject: Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of
 RAM that should be free

On Tue 22-11-16 11:38:47, Linus Torvalds wrote:
> On Tue, Nov 22, 2016 at 8:14 AM, Vlastimil Babka <vbabka@...e.cz> wrote:
> >
> > Thanks a lot for the testing. So what do we do now about 4.8? (4.7 is
> > already EOL AFAICS).
> >
> > - send the patch [1] as 4.8-only stable.
> 
> I think that's the right thing to do. It's pretty small, and the
> argument that it changes the oom logic too much is pretty bogus, I
> think. The oom logic in 4.8 is simply broken. Let's get it fixed.
> Changing it is the point.

The point I've tried to make is that it is not should_reclaim_retry
which is broken. It's an overly optimistic reliance on the compaction
to do it's work which led to all those issues. My previous fix
31e49bfda184 ("mm, oom: protect !costly allocations some more for
!CONFIG_COMPACTION") tried to cope with that by checking the order-0
watermark which has proven to help most users. Now it didn't cover
everybody obviously. Rather than fiddling with fine tuning of these
heuristics I think it would be safer to simply admit that high order
OOM detection doesn't work in 4.8 kernel and so do not declare the OOM
killer for those requests at all. The risk of such a change is not big
because there usually are order-0 requests happening all the time so if
we are really OOM we would trigger the OOM eventually.

So I am proposing this for 4.8 stable tree instead
---
commit b2ccdcb731b666aa28f86483656c39c5e53828c7
Author: Michal Hocko <mhocko@...e.com>
Date:   Wed Nov 23 07:26:30 2016 +0100

    mm, oom: stop pre-mature high-order OOM killer invocations
    
    31e49bfda184 ("mm, oom: protect !costly allocations some more for
    !CONFIG_COMPACTION") was an attempt to reduce chances of pre-mature OOM
    killer invocation for high order requests. It seemed to work for most
    users just fine but it is far from bullet proof and obviously not
    sufficient for Marc who has reported pre-mature OOM killer invocations
    with 4.8 based kernels. 4.9 will all the compaction improvements seems
    to be behaving much better but that would be too intrusive to backport
    to 4.8 stable kernels. Instead this patch simply never declares OOM for
    !costly high order requests. We rely on order-0 requests to do that in
    case we are really out of memory. Order-0 requests are much more common
    and so a risk of a livelock without any way forward is highly unlikely.
    
    Reported-by: Marc MERLIN <marc@...lins.org>
    Signed-off-by: Michal Hocko <mhocko@...e.com>

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a2214c64ed3c..7401e996009a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3161,6 +3161,16 @@ should_compact_retry(struct alloc_context *ac, unsigned int order, int alloc_fla
 	if (!order || order > PAGE_ALLOC_COSTLY_ORDER)
 		return false;
 
+#ifdef CONFIG_COMPACTION
+	/*
+	 * This is a gross workaround to compensate a lack of reliable compaction
+	 * operation. We cannot simply go OOM with the current state of the compaction
+	 * code because this can lead to pre mature OOM declaration.
+	 */
+	if (order <= PAGE_ALLOC_COSTLY_ORDER)
+		return true;
+#endif
+
 	/*
 	 * There are setups with compaction disabled which would prefer to loop
 	 * inside the allocator rather than hit the oom killer prematurely.
-- 
Michal Hocko
SUSE Labs