lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 08 Mar 2017 10:54:57 -0500
From:   Rik van Riel <riel@...hat.com>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Mel Gorman <mgorman@...e.de>,
        Johannes Weiner <hannes@...xchg.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
        linux-mm@...ck.org, LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever

On Wed, 2017-03-08 at 10:21 +0100, Michal Hocko wrote:

> > Could that create problems if we have many concurrent
> > reclaimers?
> 
> As the changelog mentions it might cause a premature oom killer
> invocation theoretically. We could easily see that from the oom
> report
> by checking isolated counters. My testing didn't trigger that though
> and I was hammering the page allocator path from many threads.
> 
> I suspect some artificial tests can trigger that, I am not so sure
> about
> reasonabel workloads. If we see this happening though then the fix
> would
> be to resurrect my previous attempt to track NR_ISOLATED* per zone
> and
> use them in the allocator retry logic.

I am not sure the workload in question is "artificial".
A heavily forking (or multi-threaded) server running out
of physical memory could easily get hundreds of tasks
doing direct reclaim simultaneously.

In fact, false OOM kills with that kind of workload is
how we ended up getting the "too many isolated" logic
in the first place.

I am perfectly fine with moving the retry logic up like
you did, but think it may make sense to check the number
of reclaimable pages if we have too many isolated pages,
instead of risking a too-early OOM kill.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ