linux-kernel - Re: [PATCH 0/3] OOM detection rework v4

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160226093317.GC8940@dhcp22.suse.cz>
Date:	Fri, 26 Feb 2016 10:33:18 +0100
From:	Michal Hocko <mhocko@...nel.org>
To:	Hugh Dickins <hughd@...gle.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Johannes Weiner <hannes@...xchg.org>,
	Mel Gorman <mgorman@...e.de>,
	David Rientjes <rientjes@...gle.com>,
	Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
	Hillf Danton <hillf.zj@...baba-inc.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	linux-mm@...ck.org, LKML <linux-kernel@...r.kernel.org>,
	Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
Subject: Re: [PATCH 0/3] OOM detection rework v4

On Thu 25-02-16 22:32:54, Hugh Dickins wrote:
> On Thu, 25 Feb 2016, Michal Hocko wrote:
[...]
> > From d09de26cee148b4d8c486943b4e8f3bd7ad6f4be Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@...e.com>
> > Date: Thu, 4 Feb 2016 14:56:59 +0100
> > Subject: [PATCH] mm, oom: protect !costly allocations some more
> > 
> > should_reclaim_retry will give up retries for higher order allocations
> > if none of the eligible zones has any requested or higher order pages
> > available even if we pass the watermak check for order-0. This is done
> > because there is no guarantee that the reclaimable and currently free
> > pages will form the required order.
> > 
> > This can, however, lead to situations were the high-order request (e.g.
> > order-2 required for the stack allocation during fork) will trigger
> > OOM too early - e.g. after the first reclaim/compaction round. Such a
> > system would have to be highly fragmented and the OOM killer is just a
> > matter of time but let's stick to our MAX_RECLAIM_RETRIES for the high
> > order and not costly requests to make sure we do not fail prematurely.
> > 
> > This also means that we do not reset no_progress_loops at the
> > __alloc_pages_slowpath for high order allocations to guarantee a bounded
> > number of retries.
> > 
> > Longterm it would be much better to communicate with the compaction
> > and retry only if the compaction considers it meaningfull.
> > 
> > Signed-off-by: Michal Hocko <mhocko@...e.com>
> 
> It didn't really help, I'm afraid: it reduces the actual number of OOM
> kills which occur before the job is terminated, but doesn't stop the
> job from being terminated very soon.

Yeah this is not a magic bullet. I am happy to hear that the patch
actually helped to reduce the number of OOM kills, though, because that is
what it aims to do. I also believe that supports (at least partially) my
suspicious that it is compaction which doesn't try enough.
order-0 reclaim, even when done repeatedly, doesn't have a great
chances to form higher order pages. Especially when there is a lot of
migrateable memory. I have already talked about this with Vlastimil and
he said that compaction can indeed back off too early because it doesn't
care about !costly request much at all. We will have a look into this
more next week.
-- 
Michal Hocko
SUSE Labs