[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOUHufbor4L5OfjNFEMcaM5bXnM9VWXw-LK9A04vP65rkcLNMg@mail.gmail.com>
Date: Sat, 12 Mar 2022 14:11:49 -0700
From: Yu Zhao <yuzhao@...gle.com>
To: Barry Song <21cnbao@...il.com>
Cc: Johannes Weiner <hannes@...xchg.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Mel Gorman <mgorman@...e.de>, Michal Hocko <mhocko@...nel.org>,
Andi Kleen <ak@...ux.intel.com>,
Aneesh Kumar <aneesh.kumar@...ux.ibm.com>,
Catalin Marinas <catalin.marinas@....com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Hillf Danton <hdanton@...a.com>, Jens Axboe <axboe@...nel.dk>,
Jesse Barnes <jsbarnes@...gle.com>,
Jonathan Corbet <corbet@....net>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Matthew Wilcox <willy@...radead.org>,
Michael Larabel <Michael@...haellarabel.com>,
Mike Rapoport <rppt@...nel.org>,
Rik van Riel <riel@...riel.com>,
Vlastimil Babka <vbabka@...e.cz>,
Will Deacon <will@...nel.org>,
Ying Huang <ying.huang@...el.com>,
LAK <linux-arm-kernel@...ts.infradead.org>,
Linux Doc Mailing List <linux-doc@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>,
Linux-MM <linux-mm@...ck.org>,
Kernel Page Reclaim v2 <page-reclaim@...gle.com>,
x86 <x86@...nel.org>, Brian Geffon <bgeffon@...gle.com>,
Jan Alexander Steffens <heftig@...hlinux.org>,
Oleksandr Natalenko <oleksandr@...alenko.name>,
Steven Barrett <steven@...uorix.net>,
Suleiman Souhlal <suleiman@...gle.com>,
Daniel Byrne <djbyrne@....edu>,
Donald Carr <d@...os-reins.com>,
Holger Hoffstätte <holger@...lied-asynchrony.com>,
Konstantin Kharlamov <Hi-Angel@...dex.ru>,
Shuang Zhai <szhai2@...rochester.edu>,
Sofia Trinh <sofia.trinh@....works>
Subject: Re: [PATCH v7 04/12] mm: multigenerational LRU: groundwork
On Sat, Mar 12, 2022 at 3:37 AM Barry Song <21cnbao@...il.com> wrote:
>
> On Sat, Mar 12, 2022 at 12:45 PM Yu Zhao <yuzhao@...gle.com> wrote:
> >
> > On Fri, Mar 11, 2022 at 3:16 AM Barry Song <21cnbao@...il.com> wrote:
> > >
> > > On Tue, Feb 15, 2022 at 10:43 PM Yu Zhao <yuzhao@...gle.com> wrote:
> > > >
> > > > On Thu, Feb 10, 2022 at 03:41:57PM -0500, Johannes Weiner wrote:
> > > >
> > > > Thanks for reviewing.
> > > >
> > > > > > +static inline bool lru_gen_is_active(struct lruvec *lruvec, int gen)
> > > > > > +{
> > > > > > + unsigned long max_seq = lruvec->lrugen.max_seq;
> > > > > > +
> > > > > > + VM_BUG_ON(gen >= MAX_NR_GENS);
> > > > > > +
> > > > > > + /* see the comment on MIN_NR_GENS */
> > > > > > + return gen == lru_gen_from_seq(max_seq) || gen == lru_gen_from_seq(max_seq - 1);
> > > > > > +}
> > > > >
> > > > > I'm still reading the series, so correct me if I'm wrong: the "active"
> > > > > set is split into two generations for the sole purpose of the
> > > > > second-chance policy for fresh faults, right?
> > > >
> > > > To be precise, the active/inactive notion on top of generations is
> > > > just for ABI compatibility, e.g., the counters in /proc/vmstat.
> > > > Otherwise, this function wouldn't be needed.
> > >
> > > Hi Yu,
> > > I am still quite confused as i am seeing both active/inactive and lru_gen.
> > > eg:
> > >
> > > root@...ntu:~# cat /proc/vmstat | grep active
> > > nr_zone_inactive_anon 22797
> > > nr_zone_active_anon 578405
> > > nr_zone_inactive_file 0
> > > nr_zone_active_file 4156
> > > nr_inactive_anon 22800
> > > nr_active_anon 578574
> > > nr_inactive_file 0
> > > nr_active_file 4215
> >
> > Yes, this is expected. We have to maintain the ABI, i.e., the
> > *_active/inactive_* counters.
> >
> > > and:
> > >
> > > root@...ntu:~# cat /sys//kernel/debug/lru_gen
> > >
> > > ...
> > > memcg 36 /user.slice/user-0.slice/user@...ervice
> > > node 0
> > > 20 18820 22 0
> > > 21 7452 0 0
> > > 22 7448 0 0
> > > memcg 33 /user.slice/user-0.slice/user@...ervice/app.slice
> > > node 0
> > > 0 2171452 0 0
> > > 1 2171452 0 0
> > > 2 2171452 0 0
> > > 3 2171452 0 0
> > > memcg 37 /user.slice/user-0.slice/session-1.scope
> > > node 0
> > > 42 51804 102127 0
> > > 43 18840 275622 0
> > > 44 16104 216805 1
> > >
> > > Does it mean one page could be in both one of the generations and one
> > > of the active/inactive lists?
> >
> > In terms of the data structure, evictable pages are either on
> > lruvec->lists or lrugen->lists.
> >
> > > Do we have some mapping relationship between active/inactive lists
> > > with generations?
> >
> > For the counters, yes -- pages in max_seq and max_seq-1 are counted as
> > active, and the rest are inactive.
> >
> > > We used to put a faulted file page in inactive, if we access it a
> > > second time, it can be promoted
> > > to active. then in recent years, we have also applied this to anon
> > > pages while kernel adds
> > > workingset protection for anon pages. so basically both anon and file
> > > pages go into the inactive
> > > list for the 1st time, if we access it for the second time, they go to
> > > the active list. if we don't access
> > > it any more, they are likely to be reclaimed as they are inactive.
> > > we do have some special fastpath for code section, executable file
> > > pages are kept on active list
> > > as long as they are accessed.
> >
> > Yes.
> >
> > > so all of the above concerns are actually not that correct?
> >
> > They are valid concerns but I don't know any popular workloads that
> > care about them.
>
> Hi Yu,
> here we can get a workload in Kim's patchset while he added workingset
> protection
> for anon pages:
> https://patchwork.kernel.org/project/linux-mm/cover/1581401993-20041-1-git-send-email-iamjoonsoo.kim@lge.com/
Thanks. I wouldn't call that a workload because it's not a real
application. By popular workloads, I mean applications that the
majority of people actually run on phones, in cloud, etc.
> anon pages used to go to active rather than inactive, but kim's patchset
> moved to use inactive first. then only after the anon page is accessed
> second time, it can move to active.
Yes. To clarify, the A-bit doesn't really mean the first or second
access. It can be many accesses each time it's set.
> "In current implementation, newly created or swap-in anonymous page is
>
> started on the active list. Growing the active list results in rebalancing
> active/inactive list so old pages on the active list are demoted to the
> inactive list. Hence, hot page on the active list isn't protected at all.
>
> Following is an example of this situation.
>
> Assume that 50 hot pages on active list and system can contain total
> 100 pages. Numbers denote the number of pages on active/inactive
> list (active | inactive). (h) stands for hot pages and (uo) stands for
> used-once pages.
>
> 1. 50 hot pages on active list
> 50(h) | 0
>
> 2. workload: 50 newly created (used-once) pages
> 50(uo) | 50(h)
>
> 3. workload: another 50 newly created (used-once) pages
> 50(uo) | 50(uo), swap-out 50(h)
>
> As we can see, hot pages are swapped-out and it would cause swap-in later."
>
> Is MGLRU able to avoid the swap-out of the 50 hot pages?
I think the real question is why the 50 hot pages can be moved to the
inactive list. If they are really hot, the A-bit should protect them.
> since MGLRU
> is putting faulted pages to the youngest generation directly, do we have the
> risk mentioned in Kim's patchset?
There are always risks :) I could imagine a thousand ways to make VM
suffer, but all of them could be irrelevant to how it actually does in
production. So a concrete use case of yours would be much appreciated
for this discussion.
Powered by blists - more mailing lists