linux-kernel - Re: [PATCH v7 05/12] mm: multigenerational LRU: minimal implementation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOUHufY9h2K4dPnufW-uD-EEuvROf6y7cF-w1gJ2VAFaSEDD7Q@mail.gmail.com>
Date:   Wed, 23 Feb 2022 22:35:32 -0700
From:   Yu Zhao <yuzhao@...gle.com>
To:     "Huang, Ying" <ying.huang@...el.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Mel Gorman <mgorman@...e.de>, Michal Hocko <mhocko@...nel.org>,
        Andi Kleen <ak@...ux.intel.com>,
        Aneesh Kumar <aneesh.kumar@...ux.ibm.com>,
        Barry Song <21cnbao@...il.com>,
        Catalin Marinas <catalin.marinas@....com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Hillf Danton <hdanton@...a.com>, Jens Axboe <axboe@...nel.dk>,
        Jesse Barnes <jsbarnes@...gle.com>,
        Jonathan Corbet <corbet@....net>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Matthew Wilcox <willy@...radead.org>,
        Michael Larabel <Michael@...haellarabel.com>,
        Mike Rapoport <rppt@...nel.org>,
        Rik van Riel <riel@...riel.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Will Deacon <will@...nel.org>,
        Linux ARM <linux-arm-kernel@...ts.infradead.org>,
        "open list:DOCUMENTATION" <linux-doc@...r.kernel.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>,
        Kernel Page Reclaim v2 <page-reclaim@...gle.com>,
        "the arch/x86 maintainers" <x86@...nel.org>,
        Brian Geffon <bgeffon@...gle.com>,
        Jan Alexander Steffens <heftig@...hlinux.org>,
        Oleksandr Natalenko <oleksandr@...alenko.name>,
        Steven Barrett <steven@...uorix.net>,
        Suleiman Souhlal <suleiman@...gle.com>,
        Daniel Byrne <djbyrne@....edu>,
        Donald Carr <d@...os-reins.com>,
        Holger Hoffstätte <holger@...lied-asynchrony.com>,
        Konstantin Kharlamov <Hi-Angel@...dex.ru>,
        Shuang Zhai <szhai2@...rochester.edu>,
        Sofia Trinh <sofia.trinh@....works>
Subject: Re: [PATCH v7 05/12] mm: multigenerational LRU: minimal implementation

On Wed, Feb 23, 2022 at 10:27 PM Huang, Ying <ying.huang@...el.com> wrote:
>
> Yu Zhao <yuzhao@...gle.com> writes:
>
> > On Wed, Feb 23, 2022 at 8:32 PM Huang, Ying <ying.huang@...el.com> wrote:
> >>
> >> Yu Zhao <yuzhao@...gle.com> writes:
> >>
> >> > On Wed, Feb 23, 2022 at 5:59 PM Huang, Ying <ying.huang@...el.com> wrote:
> >> >>
> >> >> Yu Zhao <yuzhao@...gle.com> writes:
> >> >>
> >> >> > On Wed, Feb 23, 2022 at 1:28 AM Huang, Ying <ying.huang@...el.com> wrote:
> >> >> >>
> >> >> >> Hi, Yu,
> >> >> >>
> >> >> >> Yu Zhao <yuzhao@...gle.com> writes:
> >> >> >>
> >> >> >> > To avoid confusions, the terms "promotion" and "demotion" will be
> >> >> >> > applied to the multigenerational LRU, as a new convention; the terms
> >> >> >> > "activation" and "deactivation" will be applied to the active/inactive
> >> >> >> > LRU, as usual.
> >> >> >>
> >> >> >> In the memory tiering related commits and patchset, for example as follows,
> >> >> >>
> >> >> >> commit 668e4147d8850df32ca41e28f52c146025ca45c6
> >> >> >> Author: Yang Shi <yang.shi@...ux.alibaba.com>
> >> >> >> Date:   Thu Sep 2 14:59:19 2021 -0700
> >> >> >>
> >> >> >>     mm/vmscan: add page demotion counter
> >> >> >>
> >> >> >> https://lore.kernel.org/linux-mm/20220221084529.1052339-1-ying.huang@intel.com/
> >> >> >>
> >> >> >> "demote" and "promote" is used for migrating pages between different
> >> >> >> types of memory.  Is it better for us to avoid overloading these words
> >> >> >> too much to avoid the possible confusion?
> >> >> >
> >> >> > Given that LRU and migration are usually different contexts, I think
> >> >> > we'd be fine, unless we want a third pair of terms.
> >> >>
> >> >> This is true before memory tiering is introduced.  In systems with
> >> >> multiple types memory (called memory tiering), LRU is used to identify
> >> >> pages to be migrated to the slow memory node.  Please take a look at
> >> >> can_demote(), which is called in shrink_page_list().
> >> >
> >> > This sounds clearly two contexts to me. Promotion/demotion (move
> >> > between generations) while pages are on LRU; or promotion/demotion
> >> > (migration between nodes) after pages are taken off LRU.
> >> >
> >> > Note that promotion/demotion are not used in function names. They are
> >> > used to describe how MGLRU works, in comparison with the
> >> > active/inactive LRU. Memory tiering is not within this context.
> >>
> >> Because we have used pgdemote_* in /proc/vmstat, "demotion_enabled" in
> >> /sys/kernel/mm/numa, and will use pgpromote_* in /proc/vmstat.  It seems
> >> better to avoid to use promote/demote directly for MGLRU in ABI.  A
> >> possible solution is to use "mglru" and "promote/demote" together (such
> >> as "mglru_promote_*" when it is needed?
> >
> > *If* it is needed. Currently there are no such plans.
>
> OK.
>
> >> >> >> > +static int get_swappiness(struct mem_cgroup *memcg)
> >> >> >> > +{
> >> >> >> > +     return mem_cgroup_get_nr_swap_pages(memcg) >= MIN_LRU_BATCH ?
> >> >> >> > +            mem_cgroup_swappiness(memcg) : 0;
> >> >> >> > +}
> >> >> >>
> >> >> >> After we introduced demotion support in Linux kernel.  The anonymous
> >> >> >> pages in the fast memory node could be demoted to the slow memory node
> >> >> >> via the page reclaiming mechanism as in the following commit.  Can you
> >> >> >> consider that too?
> >> >> >
> >> >> > Sure. How do I check whether there is still space on the slow node?
> >> >>
> >> >> You can always check the watermark of the slow node.  But now, we
> >> >> actually don't check that (as in demote_page_list()), instead we will
> >> >> wake up kswapd of the slow node.  The intended behavior is something
> >> >> like,
> >> >>
> >> >>   DRAM -> PMEM -> disk
> >> >
> >> > I'll look into this later -- for now, it's a low priority because
> >> > there isn't much demand. I'll bump it up if anybody is interested in
> >> > giving it a try. Meanwhile, please feel free to cook up something if
> >> > you are interested.
> >>
> >> When we introduce a new feature, we shouldn't break an existing one.
> >> That is, not introducing regression.  I think that it is a rule?
> >>
> >> If my understanding were correct, MGLRU will ignore to scan anonymous
> >> page list even if there's demotion target for the node.  This breaks the
> >> demotion feature in the upstream kernel.  Right?
> >
> > I'm not saying this shouldn't be fixed. I'm saying it's a low priority
> > until somebody is interested in using/testing it (or making it work).
>
> We are interested in this feature and can help to test it.

That's great. I'll make sure it works in the next version.