[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAOUHufadf0kmHTY4=gvMtm4LfFFjijYHWmX5VcYqwDEhw18=+A@mail.gmail.com>
Date: Tue, 5 Nov 2024 12:28:15 -0700
From: Yu Zhao <yuzhao@...gle.com>
To: James Houghton <jthoughton@...gle.com>, Jonathan Corbet <corbet@....net>
Cc: Sean Christopherson <seanjc@...gle.com>, Paolo Bonzini <pbonzini@...hat.com>,
David Matlack <dmatlack@...gle.com>, David Rientjes <rientjes@...gle.com>, Marc Zyngier <maz@...nel.org>,
Oliver Upton <oliver.upton@...ux.dev>, Wei Xu <weixugc@...gle.com>,
Axel Rasmussen <axelrasmussen@...gle.com>, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v8 00/11] KVM: x86/mmu: Age sptes locklessly
On Tue, Nov 5, 2024 at 12:21 PM Yu Zhao <yuzhao@...gle.com> wrote:
>
> On Tue, Nov 5, 2024 at 11:43 AM James Houghton <jthoughton@...gle.com> wrote:
> >
> > Andrew has queued patches to make MGLRU consult KVM when doing aging[8].
> > Now, make aging lockless for the shadow MMU and the TDP MMU. This allows
> > us to reduce the time/CPU it takes to do aging and the performance
> > impact on the vCPUs while we are aging.
> >
> > The final patch in this series modifies access_tracking_stress_test to
> > age using MGLRU. There is a mode (-p) where it will age while the vCPUs
> > are faulting memory in. Here are some results with that mode:
>
> Additional background in case I didn't provide it before:
>
> At Google we keep track of hotness/coldness of VM memory to identify
> opportunities to demote cold memory into slower tiers of storage. This
> is done in a controlled manner so that while we benefit from the
> improved memory efficiency through improved bin-packing, without
> violating customer SLOs.
>
> However, the monitoring/tracking introduced two major overheads [1] for us:
> 1. the traditional (host) PFN + rmap data structures [2] used to
> locate host PTEs (containing the accessed bits).
> 2. the KVM MMU lock required to clear the accessed bits in
> secondary/shadow PTEs.
>
> MGLRU provides the infrastructure for us to reach out into page tables
> directly from a list of mm_struct's, and therefore allows us to bypass
> the first problem above and reduce the CPU overhead by ~80% for our
> workloads (90%+ mmaped memory). This series solves the second problem:
> by supporting locklessly clearing the accessed bits in SPTEs, it would
> reduce our current KVM MMU lock contention by >80% [3]. All other
> existing mechanisms, e.g., Idle Page Tracking, DAMON, etc., can also
> seamlessly benefit from this series when monitoring/tracking VM
> memory.
>
> [1] https://lwn.net/Articles/787611/
> [2] https://docs.kernel.org/admin-guide/mm/idle_page_tracking.html
> [3] https://research.google/pubs/profiling-a-warehouse-scale-computer/
And we also ran an A/B experiment on quarter million Chromebooks
running Android in VMs last year (with an older version of this
series):
It reduced PSI by 10% at the 99th percentile and "janks" by 8% at the
95th percentile, which resulted in an overall improvement in user
engagement by 16% at the 75th percentile (all statistically
significant).
Powered by blists - more mailing lists