linux-kernel - Re: [patch] mm, vmscan: abort futile reclaim if we've been oom killed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20131120160712.GF3556@cmpxchg.org>
Date:	Wed, 20 Nov 2013 11:07:12 -0500
From:	Johannes Weiner <hannes@...xchg.org>
To:	David Rientjes <rientjes@...gle.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Mel Gorman <mgorman@...e.de>, Rik van Riel <riel@...hat.com>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [patch] mm, vmscan: abort futile reclaim if we've been oom killed

On Mon, Nov 18, 2013 at 05:17:31PM -0800, David Rientjes wrote:
> On Mon, 18 Nov 2013, Johannes Weiner wrote:
> 
> > > Um, no, those processes are going through a repeated loop of direct 
> > > reclaim, calling the oom killer, iterating the tasklist, finding an 
> > > existing oom killed process that has yet to exit, and looping.  They 
> > > wouldn't loop for too long if we can reduce the amount of time that it 
> > > takes for that oom killed process to exit.
> > 
> > I'm not talking about the big loop in the page allocator.  The victim
> > is going through the same loop.  This patch is about the victim being
> > in a pointless direct reclaim cycle when it could be exiting, all I'm
> > saying is that the other tasks doing direct reclaim at that moment
> > should also be quitting and retrying the allocation.
> > 
> 
> "All other tasks" would be defined as though sharing the same mempolicy 
> context as the oom kill victim or the same set of cpuset mems, I'm not 
> sure what type of method for determining reclaim eligiblity you're 
> proposing to avoid pointlessly spinning without making progress.  Until an 
> alternative exists, my patch avoids the needless spinning and expedites 
> the exit, so I'll ask that it be merged.

I laid this out in the second half of my email, which you apparently
did not read:

"If we have multi-second stalls in direct reclaim then it should be
 fixed for all direct reclaimers.  The problem is not only OOM kill
 victims getting stuck, it's every direct reclaimer being stuck trying
 to do way too much work before retrying the allocation.

 Kswapd checks the system state after every priority cycle.  Direct
 reclaim should probably do the same and retry the allocation after
 every priority cycle or every X pages scanned, where X is something
 reasonable and not "up to every LRU page in the system"."

NAK to this incomplete drive-by fix.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/