linux-kernel - Re: [PATCH v5 3/7] KVM: x86/mmu: Recover TDP MMU NX huge pages using MMU read lock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aI05DvQlMWJXewUi@google.com>
Date: Fri, 1 Aug 2025 15:00:46 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: David Matlack <dmatlack@...gle.com>
Cc: James Houghton <jthoughton@...gle.com>, Paolo Bonzini <pbonzini@...hat.com>, 
	Vipin Sharma <vipinsh@...gle.com>, kvm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 3/7] KVM: x86/mmu: Recover TDP MMU NX huge pages using
 MMU read lock

On Fri, Aug 01, 2025, David Matlack wrote:
> On Mon, Jul 28, 2025 at 2:49 PM James Houghton <jthoughton@...gle.com> wrote:
> > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > index a6a1fb42b2d1..e2bde6a5e346 100644
> > > --- a/arch/x86/kvm/mmu/mmu.c
> > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > @@ -7624,8 +7624,14 @@ static bool kvm_mmu_sp_dirty_logging_enabled(struct kvm *kvm,
> > >  static void kvm_recover_nx_huge_pages(struct kvm *kvm,
> > >                                       const enum kvm_mmu_type mmu_type)
> > >  {
> > > +#ifdef CONFIG_X86_64
> > > +       const bool is_tdp_mmu = mmu_type == KVM_TDP_MMU;
> > > +       spinlock_t *tdp_mmu_pages_lock = &kvm->arch.tdp_mmu_pages_lock;
> > > +#else
> > > +       const bool is_tdp_mmu = false;
> > > +       spinlock_t *tdp_mmu_pages_lock = NULL;
> > > +#endif
> > >         unsigned long to_zap = nx_huge_pages_to_zap(kvm, mmu_type);
> > > -       bool is_tdp_mmu = mmu_type == KVM_TDP_MMU;
> > >         struct list_head *nx_huge_pages;
> > >         struct kvm_mmu_page *sp;
> > >         LIST_HEAD(invalid_list);
> > > @@ -7648,15 +7654,12 @@ static void kvm_recover_nx_huge_pages(struct kvm *kvm,
> > >         rcu_read_lock();
> > >
> > >         for ( ; to_zap; --to_zap) {
> > > -#ifdef CONFIG_X86_64
> > >                 if (is_tdp_mmu)
> > > -                       spin_lock(&kvm->arch.tdp_mmu_pages_lock);
> > > -#endif
> > > +                       spin_lock(tdp_mmu_pages_lock);
> > > +
> > >                 if (list_empty(nx_huge_pages)) {
> > > -#ifdef CONFIG_X86_64
> > >                         if (is_tdp_mmu)
> > > -                               spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
> > > -#endif
> > > +                               spin_unlock(tdp_mmu_pages_lock);
> > >                         break;
> > >                 }
> > >
> > > @@ -7675,10 +7678,8 @@ static void kvm_recover_nx_huge_pages(struct kvm *kvm,
> > >
> > >                 unaccount_nx_huge_page(kvm, sp);
> > >
> > > -#ifdef CONFIG_X86_64
> > >                 if (is_tdp_mmu)
> > > -                       spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
> > > -#endif
> > > +                       spin_unlock(tdp_mmu_pages_lock);
> > >
> > >                 /*
> > >                  * Do not attempt to recover any NX Huge Pages that are being
> > > --
> >
> > LGTM! Thanks Sean.
> 
> Is the compiler not smart enough to optimize out kvm->arch.tdp_mmu_pages_lock?

Yes, the compiler will eliminate dead code at most optimization levels.  But that
optimization phase happens after initial compilation, i.e. the compiler needs to
generate the (probably intermediate?) code before it can trim away paths that are
unreachable.

> (To avoid needing the extra local variable?) I thought there was some other
> KVM code that relied on similar optimizations but I would have to go dig them
> up to remember.

KVM, and the kernel, absolutely relies on dead code elimination.  KVM most blatantly
uses the technique to avoid _defining_ stubs for code that is guarded by a Kconfig,
e.g. all of these functions are defined in sev.c (guarded by CONFIG_KVM_AMD_SEV),
but callers are guarded only with sev_guest() or sev_es_guest(), not with explicit
#idefs.

There are no build errors because the function calls aren't fully resolved until
link time (when svm.o is linked into kvm-amd.o).  But KVM still needs to _declare_
the functions, otherwise the compiler would fail during its initial code generation.

  int pre_sev_run(struct vcpu_svm *svm, int cpu);
  void sev_init_vmcb(struct vcpu_svm *svm);
  void sev_vcpu_after_set_cpuid(struct vcpu_svm *svm);
  int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in);
  void sev_es_vcpu_reset(struct vcpu_svm *svm);
  void sev_es_recalc_msr_intercepts(struct kvm_vcpu *vcpu);
  void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
  void sev_es_prepare_switch_to_guest(struct vcpu_svm *svm, struct sev_es_save_area *hostsa);
  void sev_es_unmap_ghcb(struct vcpu_svm *svm);

Other notable "users" of dead code elimination are the BUILD_BUG_ON() family of
compile-time asserts.  So long as the condition can be resolved to a constant
false value during compile time, the "call" to __compiletime_error() will be
elided and everyone is happy.

  #ifdef __OPTIMIZE__
  /*
   * #ifdef __OPTIMIZE__ is only a good approximation; for instance "make
   * CFLAGS_foo.o=-Og" defines __OPTIMIZE__, does not elide the conditional code
   * and can break compilation with wrong error message(s). Combine with
   * -U__OPTIMIZE__ when needed.
   */
  # define __compiletime_assert(condition, msg, prefix, suffix)		\
	do {								\
		/*							\
		 * __noreturn is needed to give the compiler enough	\
		 * information to avoid certain possibly-uninitialized	\
		 * warnings (regardless of the build failing).		\
		 */							\
		__noreturn extern void prefix ## suffix(void)		\
			__compiletime_error(msg);			\
		if (!(condition))					\
			prefix ## suffix();				\
	} while (0)
  #else
  # define __compiletime_assert(condition, msg, prefix, suffix) ((void)(condition))
  #endif

Note, static_assert() is different in that it's a true assertion that's resolved
early on during compilation.

 * Contrary to BUILD_BUG_ON(), static_assert() can be used at global
 * scope, but requires the expression to be an integer constant
 * expression (i.e., it is not enough that __builtin_constant_p() is
 * true for expr).


>From a previous thread related to asserts (https://lore.kernel.org/all/aFGY0KVUksf1a6xB@google.com):

 : The advantage of BUILD_BUG_ON() is that it works so long as the condition is
 : compile-time constant, whereas static_assert() requires the condition to an
 : integer constant expression.  E.g. BUILD_BUG_ON() can be used so long as the
 : condition is eventually resolved to a constant, whereas static_assert() has
 : stricter requirements.
 : 
 : E.g. the fls64() assert below is fully resolved at compile time, but isn't a
 : purely constant expression, i.e. that one *needs* to be BUILD_BUG_ON().
 : 
 : --
 : arch/x86/kvm/svm/avic.c: In function ‘avic_init_backing_page’:
 : arch/x86/kvm/svm/avic.c:293:45: error: expression in static assertion is not constant
 :   293 |         static_assert(__PHYSICAL_MASK_SHIFT <=
 : include/linux/build_bug.h:78:56: note: in definition of macro ‘__static_assert’
 :    78 | #define __static_assert(expr, msg, ...) _Static_assert(expr, msg)
 :       |                                                        ^~~~
 : arch/x86/kvm/svm/avic.c:293:9: note: in expansion of macro ‘static_assert’
 :   293 |         static_assert(__PHYSICAL_MASK_SHIFT <=
 :       |         ^~~~~~~~~~~~~
 : make[5]: *** [scripts/Makefile.build:203: arch/x86/kvm/svm/avic.o] Error 1
 : --
 : 
 : The downside of BUILD_BUG_ON() is that it can't be used at global scope, i.e.
 : needs to be called from a function.
 : 
 : As a result, when adding an assertion in a function, using BUILD_BUG_ON() is
 : slightly preferred, because it's less likely to break in the future.  E.g. if
 : X2AVIC_MAX_PHYSICAL_ID were changed to something that is a compile-time constant,
 : but for whatever reason isn't a pure integer constant.