lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260211120944.-eZhmdo7@linutronix.de>
Date: Wed, 11 Feb 2026 13:09:44 +0100
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: "shaikh.kamal" <shaikhkamal2012@...il.com>
Cc: kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
	linux-rt-devel@...ts.linux.dev
Subject: Re: [PATCH] KVM: mmu_notifier: make mn_invalidate_lock non-sleeping
 for non-blocking invalidations

On 2026-02-09 21:45:27 [+0530], shaikh.kamal wrote:
> mmu_notifier_invalidate_range_start() may be invoked via
> mmu_notifier_invalidate_range_start_nonblock(), e.g. from oom_reaper(),
> where sleeping is explicitly forbidden.
> 
> KVM's mmu_notifier invalidate_range_start currently takes
> mn_invalidate_lock using spin_lock(). On PREEMPT_RT, spin_lock() maps
> to rt_mutex and may sleep, triggering:
> 
>   BUG: sleeping function called from invalid context
> 
> This violates the MMU notifier contract regardless of PREEMPT_RT; RT
> kernels merely make the issue deterministic.
> 
> Fix by converting mn_invalidate_lock to a raw spinlock so that
> invalidate_range_start() remains non-sleeping while preserving the
> existing serialization between invalidate_range_start() and
> invalidate_range_end().
> 
> Signed-off-by: shaikh.kamal <shaikhkamal2012@...il.com>

Reviewed-by: Sebastian Andrzej Siewior <bigeasy@...utronix.de>

I don't see any down side doing this, but…

> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 5fcd401a5897..7a9c33f01a37 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -747,9 +747,9 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
>  	 *
>  	 * Pairs with the decrement in range_end().
>  	 */
> -	spin_lock(&kvm->mn_invalidate_lock);
> +	raw_spin_lock(&kvm->mn_invalidate_lock);
>  	kvm->mn_active_invalidate_count++;
> -	spin_unlock(&kvm->mn_invalidate_lock);
> +	raw_spin_unlock(&kvm->mn_invalidate_lock);

	atomic_inc(mn_active_invalidate_count)
>  
>  	/*
>  	 * Invalidate pfn caches _before_ invalidating the secondary MMUs, i.e.
> @@ -817,11 +817,11 @@ static void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn,
>  	kvm_handle_hva_range(kvm, &hva_range);
>  
>  	/* Pairs with the increment in range_start(). */
> -	spin_lock(&kvm->mn_invalidate_lock);
> +	raw_spin_lock(&kvm->mn_invalidate_lock);
>  	if (!WARN_ON_ONCE(!kvm->mn_active_invalidate_count))
>  		--kvm->mn_active_invalidate_count;
>  	wake = !kvm->mn_active_invalidate_count;

	wake = atomic_dec_return_safe(mn_active_invalidate_count);
	WARN_ON_ONCE(wake < 0);
	wake = !wake;

> -	spin_unlock(&kvm->mn_invalidate_lock);
> +	raw_spin_unlock(&kvm->mn_invalidate_lock);
>  
>  	/*
>  	 * There can only be one waiter, since the wait happens under
> @@ -1129,7 +1129,7 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
> @@ -1635,17 +1635,17 @@ static void kvm_swap_active_memslots(struct kvm *kvm, int as_id)
>  	 * progress, otherwise the locking in invalidate_range_start and
>  	 * invalidate_range_end will be unbalanced.
>  	 */
> -	spin_lock(&kvm->mn_invalidate_lock);
> +	raw_spin_lock(&kvm->mn_invalidate_lock);
>  	prepare_to_rcuwait(&kvm->mn_memslots_update_rcuwait);
>  	while (kvm->mn_active_invalidate_count) {
>  		set_current_state(TASK_UNINTERRUPTIBLE);
> -		spin_unlock(&kvm->mn_invalidate_lock);
> +		raw_spin_unlock(&kvm->mn_invalidate_lock);
>  		schedule();

And this I don't understand. The lock protects the rcuwait assignment
which would be needed if multiple waiters are possible. But this goes
away after the unlock and schedule() here. So these things could be
moved outside of the locked section which limits it only to the
mn_active_invalidate_count value.

> -		spin_lock(&kvm->mn_invalidate_lock);
> +		raw_spin_lock(&kvm->mn_invalidate_lock);
>  	}
>  	finish_rcuwait(&kvm->mn_memslots_update_rcuwait);
>  	rcu_assign_pointer(kvm->memslots[as_id], slots);
> -	spin_unlock(&kvm->mn_invalidate_lock);
> +	raw_spin_unlock(&kvm->mn_invalidate_lock);
>  
>  	/*
>  	 * Acquired in kvm_set_memslot. Must be released before synchronize

Sebastian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ