[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Yd6S6Js1W4AnFFmv@google.com>
Date: Wed, 12 Jan 2022 01:35:52 -0700
From: Yu Zhao <yuzhao@...gle.com>
To: Mike Rapoport <rppt@...nel.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Andi Kleen <ak@...ux.intel.com>,
Catalin Marinas <catalin.marinas@....com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Hillf Danton <hdanton@...a.com>, Jens Axboe <axboe@...nel.dk>,
Jesse Barnes <jsbarnes@...gle.com>,
Johannes Weiner <hannes@...xchg.org>,
Jonathan Corbet <corbet@....net>,
Matthew Wilcox <willy@...radead.org>,
Mel Gorman <mgorman@...e.de>,
Michael Larabel <Michael@...haellarabel.com>,
Michal Hocko <mhocko@...nel.org>,
Rik van Riel <riel@...riel.com>,
Vlastimil Babka <vbabka@...e.cz>,
Will Deacon <will@...nel.org>,
Ying Huang <ying.huang@...el.com>,
linux-arm-kernel@...ts.infradead.org, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
page-reclaim@...gle.com, x86@...nel.org,
Konstantin Kharlamov <Hi-Angel@...dex.ru>
Subject: Re: [PATCH v6 8/9] mm: multigenerational lru: user interface
On Mon, Jan 10, 2022 at 12:27:19PM +0200, Mike Rapoport wrote:
> Hi,
>
> On Tue, Jan 04, 2022 at 01:22:27PM -0700, Yu Zhao wrote:
> > Add /sys/kernel/mm/lru_gen/enabled as a runtime kill switch.
> >
> > Add /sys/kernel/mm/lru_gen/min_ttl_ms for thrashing prevention.
> > Compared with the size-based approach, e.g., [1], this time-based
> > approach has the following advantages:
> > 1) It's easier to configure because it's agnostic to applications and
> > memory sizes.
> > 2) It's more reliable because it's directly wired to the OOM killer.
> >
> > Add /sys/kernel/debug/lru_gen for working set estimation and proactive
> > reclaim. Compared with the page table-based approach and the PFN-based
> > approach, e.g., mm/damon/[vp]addr.c, this lruvec-based approach has
> > the following advantages:
> > 1) It offers better choices because it's aware of memcgs, NUMA nodes,
> > shared mappings and unmapped page cache.
> > 2) It's more scalable because it's O(nr_hot_evictable_pages), whereas
> > the PFN-based approach is O(nr_total_pages).
> >
> > Add /sys/kernel/debug/lru_gen_full for debugging.
> >
> > [1] https://lore.kernel.org/lkml/20211130201652.2218636d@mail.inbox.lv/
> >
> > Signed-off-by: Yu Zhao <yuzhao@...gle.com>
> > Tested-by: Konstantin Kharlamov <Hi-Angel@...dex.ru>
> > ---
> > Documentation/vm/index.rst | 1 +
> > Documentation/vm/multigen_lru.rst | 62 +++++
>
> The description of user visible interfaces should go to
> Documentation/admin-guide/mm
>
> Documentation/vm/multigen_lru.rst should have contained design description
> and the implementation details and it would be great to actually have such
> document.
Will do, thanks.
> > include/linux/nodemask.h | 1 +
> > mm/vmscan.c | 415 ++++++++++++++++++++++++++++++
> > 4 files changed, 479 insertions(+)
> > create mode 100644 Documentation/vm/multigen_lru.rst
> >
> > diff --git a/Documentation/vm/index.rst b/Documentation/vm/index.rst
> > index 6f5ffef4b716..f25e755b4ff4 100644
> > --- a/Documentation/vm/index.rst
> > +++ b/Documentation/vm/index.rst
> > @@ -38,3 +38,4 @@ algorithms. If you are looking for advice on simply allocating memory, see the
> > unevictable-lru
> > z3fold
> > zsmalloc
> > + multigen_lru
> > diff --git a/Documentation/vm/multigen_lru.rst b/Documentation/vm/multigen_lru.rst
> > new file mode 100644
> > index 000000000000..6f9e0181348b
> > --- /dev/null
> > +++ b/Documentation/vm/multigen_lru.rst
> > @@ -0,0 +1,62 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +=====================
> > +Multigenerational LRU
> > +=====================
> > +
> > +Quick start
> > +===========
> > +Runtime configurations
> > +----------------------
> > +:Required: Write ``1`` to ``/sys/kernel/mm/lru_gen/enable`` if the
> > + feature wasn't enabled by default.
>
> Required for what? This sentence seem to lack context. Maybe add an
> overview what is Multigenerational LRU so that users will have an idea what
> these knobs control.
Apparently I left an important part of this quick start in the next
patch, where Kconfig options are added. I'm wonder whether I should
squash the next patch into this one.
I always separate Kconfig changes and leave them in the last patch
because it gives me peace of mind knowing it'll never give any auto
bisectors a hard time.
But I saw people not following this practice, and I'm also tempted to
do so. Can anybody remind me whether it's considered a bad practice to
have code changes and Kconfig changes in the same patch?
> > +
> > +Recipes
> > +=======
>
> Some more context here will be also helpful.
Will do.
> > +Personal computers
> > +------------------
> > +:Thrashing prevention: Write ``N`` to
> > + ``/sys/kernel/mm/lru_gen/min_ttl_ms`` to prevent the working set of
> > + ``N`` milliseconds from getting evicted. The OOM killer is invoked if
> > + this working set can't be kept in memory. Based on the average human
> > + detectable lag (~100ms), ``N=1000`` usually eliminates intolerable
> > + lags due to thrashing. Larger values like ``N=3000`` make lags less
> > + noticeable at the cost of more OOM kills.
> > +
> > +Data centers
> > +------------
> > +:Debugfs interface: ``/sys/kernel/debug/lru_gen`` has the following
> > + format:
> > + ::
> > +
> > + memcg memcg_id memcg_path
> > + node node_id
> > + min_gen birth_time anon_size file_size
> > + ...
> > + max_gen birth_time anon_size file_size
> > +
> > + ``min_gen`` is the oldest generation number and ``max_gen`` is the
> > + youngest generation number. ``birth_time`` is in milliseconds.
> > + ``anon_size`` and ``file_size`` are in pages.
>
> And what does oldest and youngest generations mean from the user
> perspective?
Good question. Will add more details in the next spin.
Powered by blists - more mailing lists