linux-kernel - Re: [PATCH v2] KVM: SVM: Set/clear SRSO's BP_SPEC

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aBsGNPvG75KspVmp@google.com>
Date: Wed, 7 May 2025 07:05:24 +0000
From: Yosry Ahmed <yosry.ahmed@...ux.dev>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	Michael Larabel <Michael@...haellarabel.com>,
	Borislav Petkov <bp@...en8.de>
Subject: Re: [PATCH v2] KVM: SVM: Set/clear SRSO's BP_SPEC_REDUCE on 0 <=> 1
 VM count transitions

On Tue, May 06, 2025 at 08:57:36AM -0700, Sean Christopherson wrote:
> On Tue, May 06, 2025, Yosry Ahmed wrote:
> > On Tue, May 06, 2025 at 07:16:06AM -0700, Sean Christopherson wrote:
> > > On Tue, May 06, 2025, Yosry Ahmed wrote:
> > > > On Mon, May 05, 2025 at 11:03:00AM -0700, Sean Christopherson wrote:
> > > > > +static void svm_srso_vm_destroy(void)
> > > > > +{
> > > > > +	if (!cpu_feature_enabled(X86_FEATURE_SRSO_BP_SPEC_REDUCE))
> > > > > +		return;
> > > > > +
> > > > > +	if (atomic_dec_return(&srso_nr_vms))
> > > > > +		return;
> > > > > +
> > > > > +	guard(spinlock)(&srso_lock);
> > > > > +
> > > > > +	/*
> > > > > +	 * Verify a new VM didn't come along, acquire the lock, and increment
> > > > > +	 * the count before this task acquired the lock.
> > > > > +	 */
> > > > > +	if (atomic_read(&srso_nr_vms))
> > > > > +		return;
> > > > > +
> > > > > +	on_each_cpu(svm_srso_clear_bp_spec_reduce, NULL, 1);
> > > > 
> > > > Just a passing-by comment. I get worried about sending IPIs while
> > > > holding a spinlock because if someone ever tries to hold that spinlock
> > > > with IRQs disabled, it may cause a deadlock.
> > > > 
> > > > This is not the case for this lock, but it's not obvious (at least to
> > > > me) that holding it in a different code path that doesn't send IPIs with
> > > > IRQs disabled could cause a problem.
> > > > 
> > > > You could add a comment, convert it to a mutex to make this scenario
> > > > impossible,
> > > 
> > > Using a mutex doesn't make deadlock impossible, it's still perfectly legal to
> > > disable IRQs while holding a mutex.
> > 
> > Right, but it's illegal to hold a mutex while disabling IRQs.
> 
> Nit on the wording: it's illegal to take a mutex while IRQs are disabled.  Disabling
> IRQs while already holding a mutex is fine.

Yes :)

> 
> And it's also illegal to take a spinlock while IRQs are disabled, becauase spinlocks
> become sleepable mutexes with PREEMPT_RT=y.  While PREEMPT_RT=y isn't super common,
> people do run KVM with PREEMPT_RT=y, and I'm guessing bots/CI would trip any such
> violation quite quickly.

But I think spin_lock_irq() and friends aren't a violation with
PREEMPT_RT=y, so these wouldn't trip bots/CI AFAICT.

> 
> E.g. with IRQs disabled around the guard(spinlock)(&srso_lock):
> 
>  BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
>  in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 2799, name: qemu
>  preempt_count: 0, expected: 0
>  RCU nest depth: 0, expected: 0
>  1 lock held by qemu/2799:
>   #0: ffffffff8263f898 (srso_lock){....}-{3:3}, at: svm_vm_destroy+0x47/0xa0
>  irq event stamp: 9090
>  hardirqs last  enabled at (9089): [<ffffffff81414087>] vprintk_store+0x467/0x4d0
>  hardirqs last disabled at (9090): [<ffffffff812fd1ce>] svm_vm_destroy+0x5e/0xa0
>  softirqs last  enabled at (0): [<ffffffff8137585c>] copy_process+0xa1c/0x29f0
>  softirqs last disabled at (0): [<0000000000000000>] 0x0
>  Call Trace:
>   <TASK>
>   dump_stack_lvl+0x57/0x80
>   __might_resched.cold+0xcc/0xde
>   rt_spin_lock+0x5b/0x170
>   svm_vm_destroy+0x47/0xa0
>   kvm_destroy_vm+0x180/0x310
>   kvm_vm_release+0x1d/0x30
>   __fput+0x10d/0x2f0
>   task_work_run+0x58/0x90
>   do_exit+0x325/0xa80
>   do_group_exit+0x32/0xa0
>   get_signal+0xb5b/0xbb0
>   arch_do_signal_or_restart+0x29/0x230
>   syscall_exit_to_user_mode+0xea/0x180
>   do_syscall_64+0x7a/0x220
>   entry_SYSCALL_64_after_hwframe+0x76/0x7e
>  RIP: 0033:0x7fb50ae7fc4e
>   </TASK>
> 
> > In this case, if the other CPU is already holding the lock then there's no
> > risk of deadlock, right?
> 
> Not on srso_lock, but there's still deadlock potential on the locks used to protect
> the call_function_data structure.
> 
> > > Similarly, I don't want to add a comment, because there is absolutely nothing
> > > special/unique about this situation/lock.  E.g. KVM has tens of calls to
> > > smp_call_function_many_cond() while holding a spinlock equivalent, in the form
> > > of kvm_make_all_cpus_request() while holding mmu_lock.
> > 
> > Agreed that it's not a unique situation at all. Ideally we'd have some
> > debugging (lockdep?) magic that identifies that an IPI is being sent
> > while a lock is held, and that this specific lock is never spinned on
> > with IRQs disabled.
> 
> Sleepable spinlocks aside, the lockdep_assert_irqs_enabled() in
> smp_call_function_many_cond() already provides sufficient of coverage for that
> case.  And if code is using some other form of IPI communication *and* taking raw
> spinlocks, then I think it goes without saying that developers would need to be
> very, very careful.

I think we are not talking about the same thing, or I am
misunderstanding you. The lockdep_assert_irqs_enabled() assertion in
smp_call_function_many_cond() does not protect against this case AFAICT.

Basically imagine that a new code path is added that does:

	spin_lock_irq(&srso_lock);
	/* Some trivial logic, no IPI sending */
	spin_unlock_irq(&srso_lock);

I believe spin_lock_irq() will disable IRQs (at least on some setups)
then spin on the lock.

Now imagine svm_srso_vm_destroy() is already holding the lock and sends
the IPI from CPU 1, while CPU 2 is executing the above code with IRQs
already disabled and spinning on the lock.

This is the deadlock scenario I am talking about. The lockdep assertion
in smp_call_function_many_cond() doesn't help because IRQs are enabled
on CPU 1, the problem is that they are disabled on CPU 2.

Lockdep can detect this by keeping track of the fact that some code
paths acquire the lock with IRQs off while some code paths acquire the
lock and send IPIs, I think.

> 
> > > smp_call_function_many_cond() already asserts that IRQs are disabled, so I have
> > > zero concerns about this flow breaking in the future.
> > 
> > That doesn't really help tho, the problem is if another CPU spins on the
> > lock with IRQs disabled, regardless of whether or not it. Basically if
> > CPU 1 acquires the lock and sends an IPI while CPU 2 disables IRQs and
> > spins on the lock.
> 
> Given that svm_srso_vm_destroy() is guaranteed to call on_each_cpu() with the
> lock held at some point, I'm completely comfortable relying on its lockdep
> assertion.
> 
> > > > or dismiss my comment as being too paranoid/ridiculous :)
> > > 
> > > I wouldn't say your thought process is too paranoid; when writing the code, I had
> > > to pause and think to remember whether or not using on_each_cpu() while holding a
> > > spinlock is allowed.  But I do think the conclusion is wrong :-)
> > 
> > That's fair. I think protection against this should be done more generically
> > as I mentioned earlier, but it felt like it would be easy-ish to side-step it
> > in this case.
> 
> Eh, modifying this code in such a way that it could deadlock without lockdep
> noticing would likely send up a comincal number of red flags during code review.

It just takes another code path using spin_lock_irq() or friends to
deadlock AFAICT.

Anyway, this has side tracked and I am probably taking more of your time
that I originally wanted, I was just making an observation more-or-less
:)