lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 27 Apr 2020 13:30:51 -0700
From:   Andrew Morton <akpm@...ux-foundation.org>
To:     David Rientjes <rientjes@...gle.com>
Cc:     Vlastimil Babka <vbabka@...e.cz>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [patch] mm, oom: stop reclaiming if GFP_ATOMIC will start
 failing soon

On Sun, 26 Apr 2020 20:12:58 -0700 (PDT) David Rientjes <rientjes@...gle.com> wrote:

> > > blockable allocations and then queue a worker to asynchronously oom kill
> > > if it finds watermarks to be sufficiently low as well.
> > > 
> > 
> > Well, what's really going on here?
> > 
> > Is networking potentially consuming an unbounded amount of memory?  If
> > so, then killing a process will just cause networking to consume more
> > memory then hit against the same thing.  So presumably the answer is
> > "no, the watermarks are inappropriately set for this workload".
> > 
> > So would it not be sensible to dynamically adjust the watermarks in
> > response to this condition?  Maintain a larger pool of memory for these
> > allocations?  Or possibly push back on networking and tell it to reduce
> > its queue sizes?  So that stuff doesn't keep on getting oom-killed?
> > 
> 
> No - that would actually make the problem worse.
> 
> Today, per-zone min watermarks dictate when user allocations will loop or 
> oom kill.  should_reclaim_retry() currently loops if reclaim has succeeded 
> in the past few tries and we should be able to allocate if we are able to 
> reclaim the amount of memory that we think we can.
> 
> The issue is that this supposes that looping to reclaim more will result 
> in more free memory.  That doesn't always happen if there are concurrent 
> memory allocators.
> 
> GFP_ATOMIC allocators can access below these per-zone watermarks.  So the 
> issue is that per-zone free pages stays between ALLOC_HIGH watermarks 
> (the watermark that GFP_ATOMIC allocators can allocate to) and min 
> watermarks.  We never reclaim enough memory to get back to min watermarks 
> because reclaim cannot keep up with the amount of GFP_ATOMIC allocations.

But there should be an upper bound upon the total amount of in-flight
GFP_ATOMIC memory at any point in time?  These aren't like pagecache
which will take more if we give it more.  Setting the various
thresholds appropriately should ensure that blockable allocations don't
get their memory stolen by GPP_ATOMIC allocations?

I took a look at doing a quick-fix for the
direct-reclaimers-get-their-stuff-stolen issue about a million years
ago.  I don't recall where it ended up.  It's pretty trivial for the
direct reclaimer to free pages into current->reclaimed_pages and to
take a look in there on the allocation path, etc.  But it's only
practical for order-0 pages.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ