[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zley-u_dOlZ-S-a6@google.com>
Date: Wed, 29 May 2024 15:58:02 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Yu Zhao <yuzhao@...gle.com>
Cc: James Houghton <jthoughton@...gle.com>, Andrew Morton <akpm@...ux-foundation.org>,
Paolo Bonzini <pbonzini@...hat.com>, Albert Ou <aou@...s.berkeley.edu>,
Ankit Agrawal <ankita@...dia.com>, Anup Patel <anup@...infault.org>,
Atish Patra <atishp@...shpatra.org>, Axel Rasmussen <axelrasmussen@...gle.com>,
Bibo Mao <maobibo@...ngson.cn>, Catalin Marinas <catalin.marinas@....com>,
David Matlack <dmatlack@...gle.com>, David Rientjes <rientjes@...gle.com>,
Huacai Chen <chenhuacai@...nel.org>, James Morse <james.morse@....com>,
Jonathan Corbet <corbet@....net>, Marc Zyngier <maz@...nel.org>, Michael Ellerman <mpe@...erman.id.au>,
Nicholas Piggin <npiggin@...il.com>, Oliver Upton <oliver.upton@...ux.dev>,
Palmer Dabbelt <palmer@...belt.com>, Paul Walmsley <paul.walmsley@...ive.com>,
Raghavendra Rao Ananta <rananta@...gle.com>, Ryan Roberts <ryan.roberts@....com>,
Shaoqin Huang <shahuang@...hat.com>, Shuah Khan <shuah@...nel.org>,
Suzuki K Poulose <suzuki.poulose@....com>, Tianrui Zhao <zhaotianrui@...ngson.cn>,
Will Deacon <will@...nel.org>, Zenghui Yu <yuzenghui@...wei.com>, kvm-riscv@...ts.infradead.org,
kvm@...r.kernel.org, kvmarm@...ts.linux.dev,
linux-arm-kernel@...ts.infradead.org, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org,
linux-mips@...r.kernel.org, linux-mm@...ck.org,
linux-riscv@...ts.infradead.org, linuxppc-dev@...ts.ozlabs.org,
loongarch@...ts.linux.dev
Subject: Re: [PATCH v4 2/7] mm: multi-gen LRU: Have secondary MMUs participate
in aging
On Wed, May 29, 2024, Yu Zhao wrote:
> On Wed, May 29, 2024 at 3:59 PM Sean Christopherson <seanjc@...gle.com> wrote:
> >
> > On Wed, May 29, 2024, Yu Zhao wrote:
> > > On Wed, May 29, 2024 at 12:05 PM James Houghton <jthoughton@...gle.com> wrote:
> > > >
> > > > Secondary MMUs are currently consulted for access/age information at
> > > > eviction time, but before then, we don't get accurate age information.
> > > > That is, pages that are mostly accessed through a secondary MMU (like
> > > > guest memory, used by KVM) will always just proceed down to the oldest
> > > > generation, and then at eviction time, if KVM reports the page to be
> > > > young, the page will be activated/promoted back to the youngest
> > > > generation.
> > >
> > > Correct, and as I explained offline, this is the only reasonable
> > > behavior if we can't locklessly walk secondary MMUs.
> > >
> > > Just for the record, the (crude) analogy I used was:
> > > Imagine a large room with many bills ($1, $5, $10, ...) on the floor,
> > > but you are only allowed to pick up 10 of them (and put them in your
> > > pocket). A smart move would be to survey the room *first and then*
> > > pick up the largest ones. But if you are carrying a 500 lbs backpack,
> > > you would just want to pick up whichever that's in front of you rather
> > > than walk the entire room.
> > >
> > > MGLRU should only scan (or lookaround) secondary MMUs if it can be
> > > done lockless. Otherwise, it should just fall back to the existing
> > > approach, which existed in previous versions but is removed in this
> > > version.
> >
> > IIUC, by "existing approach" you mean completely ignore secondary MMUs that
> > don't implement a lockless walk?
>
> No, the existing approach only checks secondary MMUs for LRU folios,
> i.e., those at the end of the LRU list. It might not find the best
> candidates (the coldest ones) on the entire list, but it doesn't pay
> as much for the locking. MGLRU can *optionally* scan MMUs (secondary
> included) to find the best candidates, but it can only be a win if the
> scanning incurs a relatively low overhead, e.g., done locklessly for
> the secondary MMU. IOW, this is a balance between the cost of
> reclaiming not-so-cold (warm) folios and that of finding the coldest
> folios.
Gotcha.
I tend to agree with Yu, driving the behavior via a Kconfig may generate simpler
_code_, but I think it increases the overall system complexity. E.g. distros
will likely enable the Kconfig, and in my experience people using KVM with a
distro kernel usually aren't kernel experts, i.e. likely won't know that there's
even a decision to be made, let alone be able to make an informed decision.
Having an mmu_notifier hook that is conditionally implemented doesn't seem overly
complex, e.g. even if there's a runtime aspect at play, it'd be easy enough for
KVM to nullify its mmu_notifier hook during initialization. The hardest part is
likely going to be figuring out the threshold for how much overhead is too much.
Powered by blists - more mailing lists