linux-kernel - Re: [PATCH 0/5] Candidate fixes for premature OOM kills with node-lru v2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20160728102751.GB2799@techsingularity.net>
Date:	Thu, 28 Jul 2016 11:27:51 +0100
From:	Mel Gorman <mgorman@...hsingularity.net>
To:	Joonsoo Kim <iamjoonsoo.kim@....com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Johannes Weiner <hannes@...xchg.org>,
	Minchan Kim <minchan@...nel.org>,
	Michal Hocko <mhocko@...e.cz>,
	Vlastimil Babka <vbabka@...e.cz>,
	Linux-MM <linux-mm@...ck.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 0/5] Candidate fixes for premature OOM kills with
 node-lru v2

On Thu, Jul 28, 2016 at 03:44:33PM +0900, Joonsoo Kim wrote:
> > To some extent, it could be "addressed" by immediately reclaiming active
> > pages moving to the inactive list at the cost of distorting page age for a
> > workload that is genuinely close to OOM. That is similar to what zone-lru
> > ended up doing -- fast reclaiming young pages from a zone.
> 
> My expectation on my test case is that reclaimers should kick out
> actively used page and make a room for 'fork' because parallel readers
> would work even if reading pages are not cached.
> 
> It is sensitive on reclaimers efficiency because parallel readers
> read pages repeatedly and disturb reclaim. I thought that it is a
> good test for node-lru which changes reclaimers efficiency for lower
> zone. However, as you said, this efficiency comes from the cost
> distorting page aging so now I'm not sure if it is a problem that we
> need to consider. Let's skip it?
> 

I think we should skip it for now. The alterations are too specific to a
test case that is very close to being genuinely OOM. Adjusting timing
for one OOM case may just lead to complains that OOM is detected too
slowly in others.

> Anyway, thanks for tracking down the problem.
> 

My pleasure, thanks to both you and Minchan for persisting with this as
we got some important fixes out of the discussion.

-- 
Mel Gorman
SUSE Labs