[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260211120944.-eZhmdo7@linutronix.de>
Date: Wed, 11 Feb 2026 13:09:44 +0100
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: "shaikh.kamal" <shaikhkamal2012@...il.com>
Cc: kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-rt-devel@...ts.linux.dev
Subject: Re: [PATCH] KVM: mmu_notifier: make mn_invalidate_lock non-sleeping
for non-blocking invalidations
On 2026-02-09 21:45:27 [+0530], shaikh.kamal wrote:
> mmu_notifier_invalidate_range_start() may be invoked via
> mmu_notifier_invalidate_range_start_nonblock(), e.g. from oom_reaper(),
> where sleeping is explicitly forbidden.
>
> KVM's mmu_notifier invalidate_range_start currently takes
> mn_invalidate_lock using spin_lock(). On PREEMPT_RT, spin_lock() maps
> to rt_mutex and may sleep, triggering:
>
> BUG: sleeping function called from invalid context
>
> This violates the MMU notifier contract regardless of PREEMPT_RT; RT
> kernels merely make the issue deterministic.
>
> Fix by converting mn_invalidate_lock to a raw spinlock so that
> invalidate_range_start() remains non-sleeping while preserving the
> existing serialization between invalidate_range_start() and
> invalidate_range_end().
>
> Signed-off-by: shaikh.kamal <shaikhkamal2012@...il.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
I don't see any down side doing this, but…
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 5fcd401a5897..7a9c33f01a37 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -747,9 +747,9 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
> *
> * Pairs with the decrement in range_end().
> */
> - spin_lock(&kvm->mn_invalidate_lock);
> + raw_spin_lock(&kvm->mn_invalidate_lock);
> kvm->mn_active_invalidate_count++;
> - spin_unlock(&kvm->mn_invalidate_lock);
> + raw_spin_unlock(&kvm->mn_invalidate_lock);
atomic_inc(mn_active_invalidate_count)
>
> /*
> * Invalidate pfn caches _before_ invalidating the secondary MMUs, i.e.
> @@ -817,11 +817,11 @@ static void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn,
> kvm_handle_hva_range(kvm, &hva_range);
>
> /* Pairs with the increment in range_start(). */
> - spin_lock(&kvm->mn_invalidate_lock);
> + raw_spin_lock(&kvm->mn_invalidate_lock);
> if (!WARN_ON_ONCE(!kvm->mn_active_invalidate_count))
> --kvm->mn_active_invalidate_count;
> wake = !kvm->mn_active_invalidate_count;
wake = atomic_dec_return_safe(mn_active_invalidate_count);
WARN_ON_ONCE(wake < 0);
wake = !wake;
> - spin_unlock(&kvm->mn_invalidate_lock);
> + raw_spin_unlock(&kvm->mn_invalidate_lock);
>
> /*
> * There can only be one waiter, since the wait happens under
> @@ -1129,7 +1129,7 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
> @@ -1635,17 +1635,17 @@ static void kvm_swap_active_memslots(struct kvm *kvm, int as_id)
> * progress, otherwise the locking in invalidate_range_start and
> * invalidate_range_end will be unbalanced.
> */
> - spin_lock(&kvm->mn_invalidate_lock);
> + raw_spin_lock(&kvm->mn_invalidate_lock);
> prepare_to_rcuwait(&kvm->mn_memslots_update_rcuwait);
> while (kvm->mn_active_invalidate_count) {
> set_current_state(TASK_UNINTERRUPTIBLE);
> - spin_unlock(&kvm->mn_invalidate_lock);
> + raw_spin_unlock(&kvm->mn_invalidate_lock);
> schedule();
And this I don't understand. The lock protects the rcuwait assignment
which would be needed if multiple waiters are possible. But this goes
away after the unlock and schedule() here. So these things could be
moved outside of the locked section which limits it only to the
mn_active_invalidate_count value.
> - spin_lock(&kvm->mn_invalidate_lock);
> + raw_spin_lock(&kvm->mn_invalidate_lock);
> }
> finish_rcuwait(&kvm->mn_memslots_update_rcuwait);
> rcu_assign_pointer(kvm->memslots[as_id], slots);
> - spin_unlock(&kvm->mn_invalidate_lock);
> + raw_spin_unlock(&kvm->mn_invalidate_lock);
>
> /*
> * Acquired in kvm_set_memslot. Must be released before synchronize
Sebastian
Powered by blists - more mailing lists