linux-kernel - Re: [PATCH v6 6/9] mm: multigenerational lru: aging

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20220111232248.1629f794@mail.inbox.lv>
Date:   Tue, 11 Jan 2022 23:22:48 +0900
From:   Alexey Avramov <hakavlad@...ox.lv>
To:     Michal Hocko <mhocko@...e.com>
Cc:     Yu Zhao <yuzhao@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Andi Kleen <ak@...ux.intel.com>,
        Catalin Marinas <catalin.marinas@....com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Hillf Danton <hdanton@...a.com>, Jens Axboe <axboe@...nel.dk>,
        Jesse Barnes <jsbarnes@...gle.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Jonathan Corbet <corbet@....net>,
        Matthew Wilcox <willy@...radead.org>,
        Mel Gorman <mgorman@...e.de>,
        Michael Larabel <Michael@...haellarabel.com>,
        Rik van Riel <riel@...riel.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Will Deacon <will@...nel.org>,
        Ying Huang <ying.huang@...el.com>,
        linux-arm-kernel@...ts.infradead.org, linux-doc@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        page-reclaim@...gle.com, x86@...nel.org,
        Konstantin Kharlamov <Hi-Angel@...dex.ru>, hakavlad@...il.com
Subject: Re: [PATCH v6 6/9] mm: multigenerational lru: aging

> I do not really see any arguments why an userspace based trashing
> detection cannot be used for those.

Firsly,
because this is the task of the kernel, not the user space. 
Memory is managed by the kernel, not by the user space. 
The absence of such a mechanism in the kernel is a fundamental problem.
The userspace tools are ugly hacks:
some of them consume a lot of CPU [1], 
some of them consume a lot of memory [2], 
some of them cannot into process_mrelease() (earlyoom, nohang), 
some of them kill only the whole cgroup (systemd-oomd, oomd) [3]
and depends on systemd and cgroup_v2 (oomd, systemd-oomd).
One of the biggest challenges for userspace oom-killers is to potentially
function under intense memory pressure and are prone to getting stuck in
memory reclaim themselves [4].

It is strange that after decades of user complaints about thrashing and
not-working OOM killer I have to explain the obvious things.
The basic mechanism must be implemented in the kernel.
Stop shifting responsibility to the user space!

Secondly,
the real reason for the min_ttl_ms mechanism is that without it, 
multi-minute stalls are possible [5] even when the killer is expected to
arrive, and memory pressure is closed to 100 at this period [6].
This fixes a bug that does not exist in the mainline LRU (this is
MGLRU-specific bug). BTW, the similar symptoms were recently fixed in the
mainline [7].

[1] https://github.com/facebookincubator/oomd/issues/79
[2] https://github.com/hakavlad/nohang#memory-and-cpu-usage
[3] https://github.com/facebookincubator/oomd/issues/125
[4] https://lore.kernel.org/all/CALvZod7vtDxJZtNhn81V=oE-EPOf=4KZB2Bv6Giz+u3bFFyOLg@mail.gmail.com/
[5] https://github.com/zen-kernel/zen-kernel/issues/223
[6] https://raw.githubusercontent.com/hakavlad/cache-tests/main/mg-LRU-v3_vs_classic-LRU/3-firefox-tail-OOM/mg-LRU-1/psi2
[7] https://lore.kernel.org/linux-mm/20211202150614.22440-1-mgorman@techsingularity.net/

[I am duplicating a previous message here - it was not delivered to mailing lists]