lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZrEx5HzBYVHH1piA@google.com>
Date: Mon, 5 Aug 2024 13:11:16 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: David Matlack <dmatlack@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 0/9] KVM: x86/mmu: Preserve Accessed bits on PROT changes

On Mon, Aug 05, 2024, David Matlack wrote:
> On Thu, Aug 1, 2024 at 11:35 AM Sean Christopherson <seanjc@...gle.com> wrote:
> >
> > This applies on top of the massive "follow pfn" rework[*].  The gist is to
> > avoid losing accessed information, e.g. because NUMA balancing mucks with
> > PTEs,
> 
> What do you mean by "NUMA balancing mucks with PTEs"?

When NUMA auto-balancing is enabled, for VMAs the current task has been accessing,
the kernel will periodically change PTEs (in the primary MMU) to PROT_NONE, i.e.
make them !PRESENT.  That in turn results in mmu_notifier invalidations (usually
for the entire VMA, eventually) that cause KVM to unmap SPTEs.  If KVM doesn't
mark folios accessed when SPTEs are zapped, the NUMA-induced zapping effectively
loses the accessed information.

For non-KVM setups, NUMA balancing works quite well because the cost of the #PF
to "fix" the NUMA-induced PROT_NONE is relatively cheap, especially compared to
the long-term costs of accessing remote memory.

For KVM, the cost vs. benefit is very different, as each mmu_notifier invalidation
forces KVM to emit a remote TLB flush, i.e. the cost is much higher.  And it's
also much more feasible (in practice) to affine vCPUs to single NUMA nodes, even
if vCPUs are pinned 1:1 with pCPUs, than it is to affine a random userspace task
to a NUMA node.

Which is why I'm not terribly concerned about optimizing NUMA auto-balancing; it's
already sub-optimal for KVM.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ