lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <YcBD2dQEaBkwz/0H@sultan-box.localdomain>
Date:   Mon, 20 Dec 2021 00:50:33 -0800
From:   Sultan Alsawaf <sultan@...neltoast.com>
To:     Mel Gorman <mgorman@...hsingularity.net>
Cc:     Alexey Avramov <hakavlad@...ox.lv>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, mhocko@...e.com, vbabka@...e.cz,
        neilb@...e.de, akpm@...ux-foundation.org, corbet@....net,
        riel@...riel.com, hannes@...xchg.org, david@...morbit.com,
        willy@...radead.org, hdanton@...a.com,
        penguin-kernel@...ove.sakura.ne.jp, oleksandr@...alenko.name,
        kernel@...mod.org, michael@...haellarabel.com, aros@....com,
        hakavlad@...il.com
Subject: Re: mm: 5.16 regression: reclaim_throttle leads to stall in near-OOM
 conditions

On Fri, Nov 26, 2021 at 04:24:16PM +0000, Mel Gorman wrote:
> It's somewhat expected. If the system is able to make some sort of
> progress and kswapd is active, it'll throttle until progress is
> impossible. It'll be somewhat variable how long it can keep making
> progress be it discarding page cache or writing to swap but it'll only
> OOM when the system is truly OOM.
> 
> Might be worth trying the patch below on top. It will delay throttling
> for longer with the caveat that CPU usage due to reclaim when very low
> on memory may be excessive.

Mel,

Perhaps my old submission [1] could be helpful here? I could send a refreshed
version if you're interested. Using wall time to throttle reclaim seems quite
catastrophic IMO, given the inherent assumptions it makes about the running
system's performance characteristics and its workloads.

My patch tackles the issue from the opposite direction: rather than throttling
when there's no reclaim progress to be made, my approach stops kswapd early when
there is no longer any need for reclaim, which conveniently doesn't require any
sort of tunable or heuristic since kswapd can just be immediately woken up again
right after if needed.

Looking back, it seems your chief complaint was that my patch may stop kswapd
before it could reclaim up to the high watermark, which could thereby introduce
stalls; however, I've never run into any such issue in my testing, and neither
have the several people who use my patch under a wide range of setups.

[1] https://lore.kernel.org/linux-mm/20200219182522.1960-1-sultan@kerneltoast.com/

Sultan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ