linux-kernel - Re: [PATCH v6 8/9] mm: multigenerational lru: user interface

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Yd6S6Js1W4AnFFmv@google.com>
Date:   Wed, 12 Jan 2022 01:35:52 -0700
From:   Yu Zhao <yuzhao@...gle.com>
To:     Mike Rapoport <rppt@...nel.org>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Andi Kleen <ak@...ux.intel.com>,
        Catalin Marinas <catalin.marinas@....com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Hillf Danton <hdanton@...a.com>, Jens Axboe <axboe@...nel.dk>,
        Jesse Barnes <jsbarnes@...gle.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Jonathan Corbet <corbet@....net>,
        Matthew Wilcox <willy@...radead.org>,
        Mel Gorman <mgorman@...e.de>,
        Michael Larabel <Michael@...haellarabel.com>,
        Michal Hocko <mhocko@...nel.org>,
        Rik van Riel <riel@...riel.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Will Deacon <will@...nel.org>,
        Ying Huang <ying.huang@...el.com>,
        linux-arm-kernel@...ts.infradead.org, linux-doc@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        page-reclaim@...gle.com, x86@...nel.org,
        Konstantin Kharlamov <Hi-Angel@...dex.ru>
Subject: Re: [PATCH v6 8/9] mm: multigenerational lru: user interface

On Mon, Jan 10, 2022 at 12:27:19PM +0200, Mike Rapoport wrote:
> Hi,
> 
> On Tue, Jan 04, 2022 at 01:22:27PM -0700, Yu Zhao wrote:
> > Add /sys/kernel/mm/lru_gen/enabled as a runtime kill switch.
> > 
> > Add /sys/kernel/mm/lru_gen/min_ttl_ms for thrashing prevention.
> > Compared with the size-based approach, e.g., [1], this time-based
> > approach has the following advantages:
> > 1) It's easier to configure because it's agnostic to applications and
> >    memory sizes.
> > 2) It's more reliable because it's directly wired to the OOM killer.
> > 
> > Add /sys/kernel/debug/lru_gen for working set estimation and proactive
> > reclaim. Compared with the page table-based approach and the PFN-based
> > approach, e.g., mm/damon/[vp]addr.c, this lruvec-based approach has
> > the following advantages:
> > 1) It offers better choices because it's aware of memcgs, NUMA nodes,
> >    shared mappings and unmapped page cache.
> > 2) It's more scalable because it's O(nr_hot_evictable_pages), whereas
> >    the PFN-based approach is O(nr_total_pages).
> > 
> > Add /sys/kernel/debug/lru_gen_full for debugging.
> > 
> > [1] https://lore.kernel.org/lkml/20211130201652.2218636d@mail.inbox.lv/
> > 
> > Signed-off-by: Yu Zhao <yuzhao@...gle.com>
> > Tested-by: Konstantin Kharlamov <Hi-Angel@...dex.ru>
> > ---
> >  Documentation/vm/index.rst        |   1 +
> >  Documentation/vm/multigen_lru.rst |  62 +++++
> 
> The description of user visible interfaces should go to
> Documentation/admin-guide/mm
> 
> Documentation/vm/multigen_lru.rst should have contained design description
> and the implementation details and it would be great to actually have such
> document.

Will do, thanks.

> >  include/linux/nodemask.h          |   1 +
> >  mm/vmscan.c                       | 415 ++++++++++++++++++++++++++++++
> >  4 files changed, 479 insertions(+)
> >  create mode 100644 Documentation/vm/multigen_lru.rst
> > 
> > diff --git a/Documentation/vm/index.rst b/Documentation/vm/index.rst
> > index 6f5ffef4b716..f25e755b4ff4 100644
> > --- a/Documentation/vm/index.rst
> > +++ b/Documentation/vm/index.rst
> > @@ -38,3 +38,4 @@ algorithms.  If you are looking for advice on simply allocating memory, see the
> >     unevictable-lru
> >     z3fold
> >     zsmalloc
> > +   multigen_lru
> > diff --git a/Documentation/vm/multigen_lru.rst b/Documentation/vm/multigen_lru.rst
> > new file mode 100644
> > index 000000000000..6f9e0181348b
> > --- /dev/null
> > +++ b/Documentation/vm/multigen_lru.rst
> > @@ -0,0 +1,62 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +=====================
> > +Multigenerational LRU
> > +=====================
> > +
> > +Quick start
> > +===========
> > +Runtime configurations
> > +----------------------
> > +:Required: Write ``1`` to ``/sys/kernel/mm/lru_gen/enable`` if the
> > + feature wasn't enabled by default.
> 
> Required for what? This sentence seem to lack context. Maybe add an
> overview what is Multigenerational LRU so that users will have an idea what
> these knobs control.

Apparently I left an important part of this quick start in the next
patch, where Kconfig options are added. I'm wonder whether I should
squash the next patch into this one.

I always separate Kconfig changes and leave them in the last patch
because it gives me peace of mind knowing it'll never give any auto
bisectors a hard time.

But I saw people not following this practice, and I'm also tempted to
do so. Can anybody remind me whether it's considered a bad practice to
have code changes and Kconfig changes in the same patch?

> > +
> > +Recipes
> > +=======
> 
> Some more context here will be also helpful.

Will do.

> > +Personal computers
> > +------------------
> > +:Thrashing prevention: Write ``N`` to
> > + ``/sys/kernel/mm/lru_gen/min_ttl_ms`` to prevent the working set of
> > + ``N`` milliseconds from getting evicted. The OOM killer is invoked if
> > + this working set can't be kept in memory. Based on the average human
> > + detectable lag (~100ms), ``N=1000`` usually eliminates intolerable
> > + lags due to thrashing. Larger values like ``N=3000`` make lags less
> > + noticeable at the cost of more OOM kills.
> > +
> > +Data centers
> > +------------
> > +:Debugfs interface: ``/sys/kernel/debug/lru_gen`` has the following
> > + format:
> > + ::
> > +
> > +   memcg  memcg_id  memcg_path
> > +     node  node_id
> > +       min_gen  birth_time  anon_size  file_size
> > +       ...
> > +       max_gen  birth_time  anon_size  file_size
> > +
> > + ``min_gen`` is the oldest generation number and ``max_gen`` is the
> > + youngest generation number. ``birth_time`` is in milliseconds.
> > + ``anon_size`` and ``file_size`` are in pages.
> 
> And what does oldest and youngest generations mean from the user
> perspective?

Good question. Will add more details in the next spin.