lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 24 Nov 2021 10:35:50 +0000
From:   Mel Gorman <mgorman@...hsingularity.net>
To:     Alexey Avramov <hakavlad@...ox.lv>
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org, mhocko@...e.com,
        vbabka@...e.cz, neilb@...e.de, akpm@...ux-foundation.org,
        corbet@....net, riel@...riel.com, hannes@...xchg.org,
        david@...morbit.com, willy@...radead.org, hdanton@...a.com,
        penguin-kernel@...ove.sakura.ne.jp, oleksandr@...alenko.name,
        kernel@...mod.org, michael@...haellarabel.com, aros@....com,
        hakavlad@...il.com
Subject: Re: mm: 5.16 regression: reclaim_throttle leads to stall in near-OOM
 conditions

On Wed, Nov 24, 2021 at 01:19:54AM +0900, Alexey Avramov wrote:
> I found stalls in near-OOM conditions with Linux 5.16. This is not the
> hang-up that was reported by Artem S. Tashkinov in 2019 [1]. It's a *new* 
> regression. I will demonstrate this with one simple experiment, which I
> will reproduce with different kernels or settings.
> 
> With older versions of the kernel, running the `tail /dev/zero` command
> usually quickly leads to OOM condition.
> 
> I will run the command `for i in {1...3}; do tail /dev/zero; done` and log
> PSI metrics (using psi2log script from nohang v0.2.0 [2]) and some values
> from `/proc/meminfo` (using mem2log v0.1.0 [3]) while this command is
> running. During the experiment a single tab browser will be kept opened in
> which some video will be playing.
> 

Ok, I can reproduce this. However, it does eventually get killed OOM so
the system makes progress but maybe the throttling should be for very
short intervals if failing to make progress and there have been multiple
reclaim failures recently. Disabling the throttling entirely just results
in cases where 100% CPU is used spinning through lru lists.

Thanks for the report

-- 
Mel Gorman
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ