[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aBtc7Y8odYFGGLrt@google.com>
Date: Wed, 7 May 2025 06:19:01 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Yosry Ahmed <yosry.ahmed@...ux.dev>
Cc: Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
Michael Larabel <Michael@...haellarabel.com>, Borislav Petkov <bp@...en8.de>
Subject: Re: [PATCH v2] KVM: SVM: Set/clear SRSO's BP_SPEC_REDUCE on 0 <=> 1
VM count transitions
On Wed, May 07, 2025, Yosry Ahmed wrote:
> On Tue, May 06, 2025 at 08:57:36AM -0700, Sean Christopherson wrote:
> > Sleepable spinlocks aside, the lockdep_assert_irqs_enabled() in
> > smp_call_function_many_cond() already provides sufficient of coverage for that
> > case. And if code is using some other form of IPI communication *and* taking raw
> > spinlocks, then I think it goes without saying that developers would need to be
> > very, very careful.
>
> I think we are not talking about the same thing, or I am
> misunderstanding you. The lockdep_assert_irqs_enabled() assertion in
> smp_call_function_many_cond() does not protect against this case AFAICT.
>
> Basically imagine that a new code path is added that does:
>
> spin_lock_irq(&srso_lock);
> /* Some trivial logic, no IPI sending */
> spin_unlock_irq(&srso_lock);
>
> I believe spin_lock_irq() will disable IRQs (at least on some setups)
> then spin on the lock.
Yes, because the most common use case for spin_lock_irq() is to prevent deadlock
due to the lock being taken in IRQ context.
> Now imagine svm_srso_vm_destroy() is already holding the lock and sends
> the IPI from CPU 1, while CPU 2 is executing the above code with IRQs
> already disabled and spinning on the lock.
>
> This is the deadlock scenario I am talking about. The lockdep assertion
> in smp_call_function_many_cond() doesn't help because IRQs are enabled
> on CPU 1, the problem is that they are disabled on CPU 2.
>
> Lockdep can detect this by keeping track of the fact that some code
> paths acquire the lock with IRQs off while some code paths acquire the
> lock and send IPIs, I think.
I understand the scenario, I just don't see any meaningful risk in this case,
which in turn means I don't see any reason to use an inferior lock type (for this
particular case) to protect the count.
spin_lock_irq() isn't a tool that's used willy-nilly, and the usage of srso_lock
is extremely limited. If we manage to merge code that does spin_lock_irq(&srso_lock),
you have my full permission to mock my ineptitude :-)
Powered by blists - more mailing lists