linux-kernel - Re: [PATCH v8 00/11] KVM: x86/mmu: Age sptes locklessly

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAOUHufYS0XyLEf_V+q5SCW54Zy2aW5nL8CnSWreM8d1rX5NKYg@mail.gmail.com>
Date: Tue, 5 Nov 2024 12:21:05 -0700
From: Yu Zhao <yuzhao@...gle.com>
To: James Houghton <jthoughton@...gle.com>, Jonathan Corbet <corbet@....net>
Cc: Sean Christopherson <seanjc@...gle.com>, Paolo Bonzini <pbonzini@...hat.com>, 
	David Matlack <dmatlack@...gle.com>, David Rientjes <rientjes@...gle.com>, Marc Zyngier <maz@...nel.org>, 
	Oliver Upton <oliver.upton@...ux.dev>, Wei Xu <weixugc@...gle.com>, 
	Axel Rasmussen <axelrasmussen@...gle.com>, kvm@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v8 00/11] KVM: x86/mmu: Age sptes locklessly

On Tue, Nov 5, 2024 at 11:43 AM James Houghton <jthoughton@...gle.com> wrote:
>
> Andrew has queued patches to make MGLRU consult KVM when doing aging[8].
> Now, make aging lockless for the shadow MMU and the TDP MMU. This allows
> us to reduce the time/CPU it takes to do aging and the performance
> impact on the vCPUs while we are aging.
>
> The final patch in this series modifies access_tracking_stress_test to
> age using MGLRU. There is a mode (-p) where it will age while the vCPUs
> are faulting memory in. Here are some results with that mode:

Additional background in case I didn't provide it before:

At Google we keep track of hotness/coldness of VM memory to identify
opportunities to demote cold memory into slower tiers of storage. This
is done in a controlled manner so that while we benefit from the
improved memory efficiency through improved bin-packing, without
violating customer SLOs.

However, the monitoring/tracking introduced two major overheads [1] for us:
1. the traditional (host) PFN + rmap data structures [2] used to
locate host PTEs (containing the accessed bits).
2. the KVM MMU lock required to clear the accessed bits in
secondary/shadow PTEs.

MGLRU provides the infrastructure for us to reach out into page tables
directly from a list of mm_struct's, and therefore allows us to bypass
the first problem above and reduce the CPU overhead by ~80% for our
workloads (90%+ mmaped memory). This series solves the second problem:
by supporting locklessly clearing the accessed bits in SPTEs, it would
reduce our current KVM MMU lock contention by >80% [3]. All other
existing mechanisms, e.g., Idle Page Tracking, DAMON, etc., can also
seamlessly benefit from this series when monitoring/tracking VM
memory.

[1] https://lwn.net/Articles/787611/
[2] https://docs.kernel.org/admin-guide/mm/idle_page_tracking.html
[3] https://research.google/pubs/profiling-a-warehouse-scale-computer/