linux-kernel - Re: [patch] mm, oom: stop reclaiming if GFP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200428074301.GK28637@dhcp22.suse.cz>
Date:   Tue, 28 Apr 2020 09:43:01 +0200
From:   Michal Hocko <mhocko@...nel.org>
To:     Andrew Morton <akpm@...ux-foundation.org>
Cc:     David Rientjes <rientjes@...gle.com>,
        Vlastimil Babka <vbabka@...e.cz>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [patch] mm, oom: stop reclaiming if GFP_ATOMIC will start
 failing soon

On Mon 27-04-20 16:35:58, Andrew Morton wrote:
[...]
> No consumer of GFP_ATOMIC memory should consume an unbounded amount of
> it.
> Subsystems such as networking will consume a certain amount and
> will then start recycling it.  The total amount in-flight will vary
> over the longer term as workloads change.  A dynamically tuning
> threshold system will need to adapt rapidly enough to sudden load
> shifts, which might require unreasonable amounts of headroom.

I do agree. __GFP_HIGH/__GFP_ATOMIC are bound by the size of the
reserves under memory pressure. Then allocatios start failing very
quickly and users have to cope with that, usually by deferring to a
sleepable context. Tuning reserves dynamically for heavy reserves
consumers would be possible but I am worried that this is far from
trivial.

We definitely need to understand what is going on here.  Why doesn't
kswapd + N*direct reclaimers do not provide enough memory to satisfy
both N threads + reserves consumers? How many times those direct
reclaimers have to retry?

We used to have the allocation stall warning as David mentioned in the
patch description and I have seen it triggering without heavy reserves
consumers (aka reported free pages corresponded to the min watermark).
The underlying problem was usually kswapd being stuck on some FS locks,
direct reclaimers stuck in shrinkers or way too overloaded system with
dozens if not hundreds of processes stuck in the page allocator each
racing with the reclaim and betting on luck. The last problem was the
most annoying because it is really hard to tune for.
-- 
Michal Hocko
SUSE Labs