lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YdwKB3SfF7hkB9Xv@kernel.org>
Date:   Mon, 10 Jan 2022 12:27:19 +0200
From:   Mike Rapoport <rppt@...nel.org>
To:     Yu Zhao <yuzhao@...gle.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Andi Kleen <ak@...ux.intel.com>,
        Catalin Marinas <catalin.marinas@....com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Hillf Danton <hdanton@...a.com>, Jens Axboe <axboe@...nel.dk>,
        Jesse Barnes <jsbarnes@...gle.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Jonathan Corbet <corbet@....net>,
        Matthew Wilcox <willy@...radead.org>,
        Mel Gorman <mgorman@...e.de>,
        Michael Larabel <Michael@...haellarabel.com>,
        Michal Hocko <mhocko@...nel.org>,
        Rik van Riel <riel@...riel.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Will Deacon <will@...nel.org>,
        Ying Huang <ying.huang@...el.com>,
        linux-arm-kernel@...ts.infradead.org, linux-doc@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        page-reclaim@...gle.com, x86@...nel.org,
        Konstantin Kharlamov <Hi-Angel@...dex.ru>
Subject: Re: [PATCH v6 8/9] mm: multigenerational lru: user interface

Hi,

On Tue, Jan 04, 2022 at 01:22:27PM -0700, Yu Zhao wrote:
> Add /sys/kernel/mm/lru_gen/enabled as a runtime kill switch.
> 
> Add /sys/kernel/mm/lru_gen/min_ttl_ms for thrashing prevention.
> Compared with the size-based approach, e.g., [1], this time-based
> approach has the following advantages:
> 1) It's easier to configure because it's agnostic to applications and
>    memory sizes.
> 2) It's more reliable because it's directly wired to the OOM killer.
> 
> Add /sys/kernel/debug/lru_gen for working set estimation and proactive
> reclaim. Compared with the page table-based approach and the PFN-based
> approach, e.g., mm/damon/[vp]addr.c, this lruvec-based approach has
> the following advantages:
> 1) It offers better choices because it's aware of memcgs, NUMA nodes,
>    shared mappings and unmapped page cache.
> 2) It's more scalable because it's O(nr_hot_evictable_pages), whereas
>    the PFN-based approach is O(nr_total_pages).
> 
> Add /sys/kernel/debug/lru_gen_full for debugging.
> 
> [1] https://lore.kernel.org/lkml/20211130201652.2218636d@mail.inbox.lv/
> 
> Signed-off-by: Yu Zhao <yuzhao@...gle.com>
> Tested-by: Konstantin Kharlamov <Hi-Angel@...dex.ru>
> ---
>  Documentation/vm/index.rst        |   1 +
>  Documentation/vm/multigen_lru.rst |  62 +++++

The description of user visible interfaces should go to
Documentation/admin-guide/mm

Documentation/vm/multigen_lru.rst should have contained design description
and the implementation details and it would be great to actually have such
document.

>  include/linux/nodemask.h          |   1 +
>  mm/vmscan.c                       | 415 ++++++++++++++++++++++++++++++
>  4 files changed, 479 insertions(+)
>  create mode 100644 Documentation/vm/multigen_lru.rst
> 
> diff --git a/Documentation/vm/index.rst b/Documentation/vm/index.rst
> index 6f5ffef4b716..f25e755b4ff4 100644
> --- a/Documentation/vm/index.rst
> +++ b/Documentation/vm/index.rst
> @@ -38,3 +38,4 @@ algorithms.  If you are looking for advice on simply allocating memory, see the
>     unevictable-lru
>     z3fold
>     zsmalloc
> +   multigen_lru
> diff --git a/Documentation/vm/multigen_lru.rst b/Documentation/vm/multigen_lru.rst
> new file mode 100644
> index 000000000000..6f9e0181348b
> --- /dev/null
> +++ b/Documentation/vm/multigen_lru.rst
> @@ -0,0 +1,62 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +=====================
> +Multigenerational LRU
> +=====================
> +
> +Quick start
> +===========
> +Runtime configurations
> +----------------------
> +:Required: Write ``1`` to ``/sys/kernel/mm/lru_gen/enable`` if the
> + feature wasn't enabled by default.

Required for what? This sentence seem to lack context. Maybe add an
overview what is Multigenerational LRU so that users will have an idea what
these knobs control.

> +
> +Recipes
> +=======

Some more context here will be also helpful.

> +Personal computers
> +------------------
> +:Thrashing prevention: Write ``N`` to
> + ``/sys/kernel/mm/lru_gen/min_ttl_ms`` to prevent the working set of
> + ``N`` milliseconds from getting evicted. The OOM killer is invoked if
> + this working set can't be kept in memory. Based on the average human
> + detectable lag (~100ms), ``N=1000`` usually eliminates intolerable
> + lags due to thrashing. Larger values like ``N=3000`` make lags less
> + noticeable at the cost of more OOM kills.
> +
> +Data centers
> +------------
> +:Debugfs interface: ``/sys/kernel/debug/lru_gen`` has the following
> + format:
> + ::
> +
> +   memcg  memcg_id  memcg_path
> +     node  node_id
> +       min_gen  birth_time  anon_size  file_size
> +       ...
> +       max_gen  birth_time  anon_size  file_size
> +
> + ``min_gen`` is the oldest generation number and ``max_gen`` is the
> + youngest generation number. ``birth_time`` is in milliseconds.
> + ``anon_size`` and ``file_size`` are in pages.

And what does oldest and youngest generations mean from the user
perspective?

> +
> + This file also accepts commands in the following subsections.
> + Multiple command lines are supported, so does concatenation with
> + delimiters ``,`` and ``;``.
> +
> + ``/sys/kernel/debug/lru_gen_full`` contains additional stats for
> + debugging.
> +
> +:Working set estimation: Write ``+ memcg_id node_id max_gen
> + [can_swap [full_scan]]`` to ``/sys/kernel/debug/lru_gen`` to trigger
> + the aging. It scans PTEs for accessed pages and promotes them to the
> + youngest generation ``max_gen``. Then it creates a new generation
> + ``max_gen+1``. Set ``can_swap`` to 1 to scan for accessed anon pages
> + when swap is off. Set ``full_scan`` to 0 to reduce the overhead as
> + well as the coverage when scanning PTEs.
> +
> +:Proactive reclaim: Write ``- memcg_id node_id min_gen [swappiness
> + [nr_to_reclaim]]`` to ``/sys/kernel/debug/lru_gen`` to trigger the
> + eviction. It evicts generations less than or equal to ``min_gen``.
> + ``min_gen`` should be less than ``max_gen-1`` as ``max_gen`` and
> + ``max_gen-1`` aren't fully aged and therefore can't be evicted. Use
> + ``nr_to_reclaim`` to limit the number of pages to evict.

...

-- 
Sincerely yours,
Mike.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ