linux-kernel - Re: [PATCH] mm, vmscan: do not loop on too_many

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170308092114.GB11028@dhcp22.suse.cz>
Date:   Wed, 8 Mar 2017 10:21:15 +0100
From:   Michal Hocko <mhocko@...nel.org>
To:     Rik van Riel <riel@...hat.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Mel Gorman <mgorman@...e.de>,
        Johannes Weiner <hannes@...xchg.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
        linux-mm@...ck.org, LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever

On Tue 07-03-17 14:52:36, Rik van Riel wrote:
> On Tue, 2017-03-07 at 14:30 +0100, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@...e.com>
> > 
> > Tetsuo Handa has reported [1][2] that direct reclaimers might get
> > stuck
> > in too_many_isolated loop basically for ever because the last few
> > pages
> > on the LRU lists are isolated by the kswapd which is stuck on fs
> > locks
> > when doing the pageout or slab reclaim. This in turn means that there
> > is
> > nobody to actually trigger the oom killer and the system is basically
> > unusable.
> > 
> > too_many_isolated has been introduced by 35cd78156c49 ("vmscan:
> > throttle
> > direct reclaim when too many pages are isolated already") to prevent
> > from pre-mature oom killer invocations because back then no reclaim
> > progress could indeed trigger the OOM killer too early. But since the
> > oom detection rework 0a0337e0d1d1 ("mm, oom: rework oom detection")
> > the allocation/reclaim retry loop considers all the reclaimable pages
> > and throttles the allocation at that layer so we can loosen the
> > direct
> > reclaim throttling.
> 
> It only does this to some extent.  If reclaim made
> no progress, for example due to immediately bailing
> out because the number of already isolated pages is
> too high (due to many parallel reclaimers), the code
> could hit the "no_progress_loops > MAX_RECLAIM_RETRIES"
> test without ever looking at the number of reclaimable
> pages.
> 
> Could that create problems if we have many concurrent
> reclaimers?

As the changelog mentions it might cause a premature oom killer
invocation theoretically. We could easily see that from the oom report
by checking isolated counters. My testing didn't trigger that though
and I was hammering the page allocator path from many threads.

I suspect some artificial tests can trigger that, I am not so sure about
reasonabel workloads. If we see this happening though then the fix would
be to resurrect my previous attempt to track NR_ISOLATED* per zone and
use them in the allocator retry logic.

-- 
Michal Hocko
SUSE Labs