lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 13 Nov 2013 10:24:12 -0500
From:	Johannes Weiner <hannes@...xchg.org>
To:	David Rientjes <rientjes@...gle.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Mel Gorman <mgorman@...e.de>, Rik van Riel <riel@...hat.com>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [patch] mm, vmscan: abort futile reclaim if we've been oom killed

On Tue, Nov 12, 2013 at 06:02:18PM -0800, David Rientjes wrote:
> The oom killer is only invoked when reclaim has already failed and it
> only kills processes if the victim is also oom.  In other words, the oom
> killer does not select victims when a process tries to allocate from a
> disjoint cpuset or allocate DMA memory, for example.
> 
> Therefore, it's pointless for an oom killed process to continue
> attempting to reclaim memory in a loop when it has been granted access to
> memory reserves.  It can simply return to the page allocator and allocate
> memory.

On the other hand, finishing reclaim of 32 pages should not be a
problem.

> If there is a very large number of processes trying to reclaim memory,
> the cond_resched() in shrink_slab() becomes troublesome since it always
> forces a schedule to other processes also trying to reclaim memory.
> Compounded by many reclaim loops, it is possible for a process to sit in
> do_try_to_free_pages() for a very long time when reclaim is pointless and
> it could allocate if it just returned to the page allocator.

"Very large number of processes"

"sit in do_try_to_free_pages() for a very long time"

Can you quantify this a bit more?

And how common are OOM kills on your setups that you need to optimize
them on this level?

It sounds like your problem could be solved by having cond_resched()
not schedule away from TIF_MEMDIE processes, which would be much
preferable to oom-killed checks in random places.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ