lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOUHufb1eDugu9Jgb4rn+nZfy5hchOqmaTSBhsuqqOTxp9YQmw@mail.gmail.com>
Date:   Tue, 22 Mar 2022 02:14:04 -0600
From:   Yu Zhao <yuzhao@...gle.com>
To:     Barry Song <21cnbao@...il.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Andi Kleen <ak@...ux.intel.com>,
        Aneesh Kumar <aneesh.kumar@...ux.ibm.com>,
        Catalin Marinas <catalin.marinas@....com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Hillf Danton <hdanton@...a.com>, Jens Axboe <axboe@...nel.dk>,
        Jesse Barnes <jsbarnes@...gle.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Jonathan Corbet <corbet@....net>,
        Matthew Wilcox <willy@...radead.org>,
        Mel Gorman <mgorman@...e.de>,
        Michael Larabel <Michael@...haellarabel.com>,
        Michal Hocko <mhocko@...nel.org>,
        Mike Rapoport <rppt@...nel.org>,
        Rik van Riel <riel@...riel.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Will Deacon <will@...nel.org>,
        Ying Huang <ying.huang@...el.com>,
        LAK <linux-arm-kernel@...ts.infradead.org>,
        Linux Doc Mailing List <linux-doc@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>,
        Kernel Page Reclaim v2 <page-reclaim@...gle.com>,
        x86 <x86@...nel.org>, Brian Geffon <bgeffon@...gle.com>,
        Jan Alexander Steffens <heftig@...hlinux.org>,
        Oleksandr Natalenko <oleksandr@...alenko.name>,
        Steven Barrett <steven@...uorix.net>,
        Suleiman Souhlal <suleiman@...gle.com>,
        Daniel Byrne <djbyrne@....edu>,
        Donald Carr <d@...os-reins.com>,
        Holger Hoffstätte <holger@...lied-asynchrony.com>,
        Konstantin Kharlamov <Hi-Angel@...dex.ru>,
        Shuang Zhai <szhai2@...rochester.edu>,
        Sofia Trinh <sofia.trinh@....works>,
        Vaibhav Jain <vaibhav@...ux.ibm.com>
Subject: Re: [PATCH v9 11/14] mm: multi-gen LRU: thrashing prevention

On Tue, Mar 22, 2022 at 1:23 AM Barry Song <21cnbao@...il.com> wrote:
>
> On Wed, Mar 9, 2022 at 3:48 PM Yu Zhao <yuzhao@...gle.com> wrote:
> >
> > Add /sys/kernel/mm/lru_gen/min_ttl_ms for thrashing prevention, as
> > requested by many desktop users [1].
> >
> > When set to value N, it prevents the working set of N milliseconds
> > from getting evicted. The OOM killer is triggered if this working set
> > cannot be kept in memory. Based on the average human detectable lag
> > (~100ms), N=1000 usually eliminates intolerable lags due to thrashing.
> > Larger values like N=3000 make lags less noticeable at the risk of
> > premature OOM kills.
> >
> > Compared with the size-based approach, e.g., [2], this time-based
> > approach has the following advantages:
> > 1. It is easier to configure because it is agnostic to applications
> >    and memory sizes.
> > 2. It is more reliable because it is directly wired to the OOM killer.
> >
>
> how are userspace oom daemons like android lmkd, systemd-oomd supposed
> to work with this time-based oom killer?
> only one of min_ttl_ms and userspace daemon should be enabled? or both
> should be enabled at the same time?

Generally we just need one. lmkd and oomd are more flexible but 1)
they need customizations 2) not all distros have them 3) they might be
stuck in direct reclaim as well.

The last remark is not just a theoretical problem:
a) we had many servers under extremely heavy (global) memory pressure,
that 200+ direct reclaimers on each CPU competed for resources and
userspace livelocked for 2 hours. Eventually hardware watchdogs kicked
in.
b) on Chromebooks we have something similar to lmkd, and we still
frequently observe crashes due to heavy memory pressure, meaning some
Chrome tabs were stuck in direct reclaim for 120 seconds
(hung_task_timeout_secs=120).

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ