linux-kernel - Re: [PATCH v9 06/14] mm: multi-gen LRU: minimal implementation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGsJ_4y=TkCGoWWtWSAptW4RDFUEBeYXwfwu=fUFvV4Sa4VA4A@mail.gmail.com>
Date:   Thu, 24 Mar 2022 21:13:48 +1300
From:   Barry Song <21cnbao@...il.com>
To:     Yu Zhao <yuzhao@...gle.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Andi Kleen <ak@...ux.intel.com>,
        Aneesh Kumar <aneesh.kumar@...ux.ibm.com>,
        Catalin Marinas <catalin.marinas@....com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Hillf Danton <hdanton@...a.com>, Jens Axboe <axboe@...nel.dk>,
        Jesse Barnes <jsbarnes@...gle.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Jonathan Corbet <corbet@....net>,
        Matthew Wilcox <willy@...radead.org>,
        Mel Gorman <mgorman@...e.de>,
        Michael Larabel <Michael@...haellarabel.com>,
        Michal Hocko <mhocko@...nel.org>,
        Mike Rapoport <rppt@...nel.org>,
        Rik van Riel <riel@...riel.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Will Deacon <will@...nel.org>,
        Ying Huang <ying.huang@...el.com>,
        LAK <linux-arm-kernel@...ts.infradead.org>,
        Linux Doc Mailing List <linux-doc@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>,
        Kernel Page Reclaim v2 <page-reclaim@...gle.com>,
        x86 <x86@...nel.org>, Brian Geffon <bgeffon@...gle.com>,
        Jan Alexander Steffens <heftig@...hlinux.org>,
        Oleksandr Natalenko <oleksandr@...alenko.name>,
        Steven Barrett <steven@...uorix.net>,
        Suleiman Souhlal <suleiman@...gle.com>,
        Daniel Byrne <djbyrne@....edu>,
        Donald Carr <d@...os-reins.com>,
        Holger Hoffstätte <holger@...lied-asynchrony.com>,
        Konstantin Kharlamov <Hi-Angel@...dex.ru>,
        Shuang Zhai <szhai2@...rochester.edu>,
        Sofia Trinh <sofia.trinh@....works>,
        Vaibhav Jain <vaibhav@...ux.ibm.com>
Subject: Re: [PATCH v9 06/14] mm: multi-gen LRU: minimal implementation

On Thu, Mar 24, 2022 at 7:24 PM Yu Zhao <yuzhao@...gle.com> wrote:
>
> On Wed, Mar 23, 2022 at 1:47 AM Barry Song <21cnbao@...il.com> wrote:
> >
> > On Sat, Mar 19, 2022 at 4:11 PM Yu Zhao <yuzhao@...gle.com> wrote:
> > >
> > > On Fri, Mar 18, 2022 at 9:01 PM Barry Song <21cnbao@...il.com> wrote:
> > > >
> > > > > +static int folio_inc_gen(struct lruvec *lruvec, struct folio *folio, bool reclaiming)
> > > > > +{
> > > > > +       unsigned long old_flags, new_flags;
> > > > > +       int type = folio_is_file_lru(folio);
> > > > > +       struct lru_gen_struct *lrugen = &lruvec->lrugen;
> > > > > +       int new_gen, old_gen = lru_gen_from_seq(lrugen->min_seq[type]);
> > > > > +
> > > > > +       do {
> > > > > +               new_flags = old_flags = READ_ONCE(folio->flags);
> > > > > +               VM_BUG_ON_FOLIO(!(new_flags & LRU_GEN_MASK), folio);
> > > > > +
> > > > > +               new_gen = ((new_flags & LRU_GEN_MASK) >> LRU_GEN_PGOFF) - 1;
> > > > > +               new_gen = (old_gen + 1) % MAX_NR_GENS;
> > > >
> > > > new_gen is assigned twice, i assume you mean
> > > >                old_gen = ((new_flags & LRU_GEN_MASK) >> LRU_GEN_PGOFF) - 1;
> > > >                new_gen = (old_gen + 1) % MAX_NR_GENS;
> > > >
> > > > or do you always mean new_gen =  lru_gen_from_seq(min_seq) + 1?
> > >
> > > Thanks a lot for your attention to details!
> > >
> > > The first line should be in the next patch but I overlooked during the
> > > last refactoring:
> >
> > Thanks for the clarification. So an unmapped file-backed page which is
> > accessed only by system call will always be in either min_seq or
> > min_seq + 1? it has no chance to be in max_seq like a faulted-in
> > mapped file page?
>
> That's right. The rationale is documented here under the `Assumptions`
> section [1]. This is also related to Aneesh's question about why MGLRU
> doesn't need additional heuristics for VM_EXEC pages [2]. Unmapped
> file pages weaken the protection of executable pages under heavy
> buffered IO workloads like Java NIO.

ok. This is probably right.
i will also run a test by maltreating unmapped page in vanilla LRU, the
PoC code is like (not been tested yet):

Subject: [PATCH 1/1] mm: vmscan: maltreat unmapped file-backed pages

[This patch has not been tested yet.]

A lesson we learned from MGLRU is that mapped filed-backed pages
are much more important than unmapped ones.
So this patch doesn't move the second accessed unmapped pages to
the active list, alternatively, it keeps the pages in the inactive
list. And we abuse PG_workingset to let the memory reclaim this
is a relatively hot file-backed page, so the reclaim should keep
the pages in the inactive list.

---
 mm/swap.c   | 34 ++++++++++++++++++++++------------
 mm/vmscan.c |  6 ++++--
 2 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/mm/swap.c b/mm/swap.c
index e65e7520bebf..cb0c6e704f2e 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -470,18 +470,28 @@ void folio_mark_accessed(struct folio *folio)
  * evictable page accessed has no effect.
  */
  } else if (!folio_test_active(folio)) {
- /*
- * If the page is on the LRU, queue it for activation via
- * lru_pvecs.activate_page. Otherwise, assume the page is on a
- * pagevec, mark it active and it'll be moved to the active
- * LRU on the next drain.
- */
- if (folio_test_lru(folio))
- folio_activate(folio);
- else
- __lru_cache_activate_folio(folio);
- folio_clear_referenced(folio);
- workingset_activation(folio);
+ if (folio_mapped(folio)) {
+ /*
+ * If the mapped page is on the LRU, queue it for activation via
+ * lru_pvecs.activate_page. Otherwise, assume the page is on a
+ * pagevec, mark it active and it'll be moved to the active
+ * LRU on the next drain.
+ */
+ if (folio_test_lru(folio))
+ folio_activate(folio);
+ else
+ __lru_cache_activate_folio(folio);
+ folio_clear_referenced(folio);
+ workingset_activation(folio);
+ } else {
+ /*
+ * we maltreat unmmaped file-backed pages and abuse PG_workingset
+ * flag to let the eviction know this page is a relatively hot file
+ * page, thus, the eviction can move it back to the head of the
+ * inactive list
+ */
+ folio_set_workingset(folio);
+ }
  }
  if (folio_test_idle(folio))
  folio_clear_idle(folio);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index d6f3c9812f97..56a66eb4a3f7 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1393,12 +1393,14 @@ enum page_references {
 static enum page_references page_check_references(struct page *page,
    struct scan_control *sc)
 {
- int referenced_ptes, referenced_page;
+ int referenced_ptes, referenced_page, workingset;
  unsigned long vm_flags;

  referenced_ptes = page_referenced(page, 1, sc->target_mem_cgroup,
    &vm_flags);
  referenced_page = TestClearPageReferenced(page);
+ workingset = page_is_file_lru(page) && !page_mapped(page) &&
+ TestClearPageWorkingset(page);

  /*
  * Mlock lost the isolation race with us.  Let try_to_unmap()
@@ -1438,7 +1440,7 @@ static enum page_references
page_check_references(struct page *page,

  /* Reclaim if clean, defer dirty pages to writeback */
  if (referenced_page && !PageSwapBacked(page))
- return PAGEREF_RECLAIM_CLEAN;
+ return workingset ?  PAGEREF_KEEP : PAGEREF_RECLAIM_CLEAN;

  return PAGEREF_RECLAIM;
 }

>
> [1] https://lore.kernel.org/linux-mm/20220309021230.721028-15-yuzhao@google.com/
> [2] https://lore.kernel.org/linux-mm/CAOUHufYfpiGdLSdffvzDqaD5oYFG99oDJ2xgQd2Ph77OFR5NAA@mail.gmail.com/

Thanks
Barry