linux-kernel - Re: [PATCH v2] KVM: SVM: Set/clear SRSO's BP_SPEC

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aBoxcOPWRWyFIgVE@google.com>
Date: Tue, 6 May 2025 08:57:36 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Yosry Ahmed <yosry.ahmed@...ux.dev>
Cc: Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org, linux-kernel@...r.kernel.org, 
	Michael Larabel <Michael@...haellarabel.com>, Borislav Petkov <bp@...en8.de>
Subject: Re: [PATCH v2] KVM: SVM: Set/clear SRSO's BP_SPEC_REDUCE on 0 <=> 1
 VM count transitions

On Tue, May 06, 2025, Yosry Ahmed wrote:
> On Tue, May 06, 2025 at 07:16:06AM -0700, Sean Christopherson wrote:
> > On Tue, May 06, 2025, Yosry Ahmed wrote:
> > > On Mon, May 05, 2025 at 11:03:00AM -0700, Sean Christopherson wrote:
> > > > +static void svm_srso_vm_destroy(void)
> > > > +{
> > > > +	if (!cpu_feature_enabled(X86_FEATURE_SRSO_BP_SPEC_REDUCE))
> > > > +		return;
> > > > +
> > > > +	if (atomic_dec_return(&srso_nr_vms))
> > > > +		return;
> > > > +
> > > > +	guard(spinlock)(&srso_lock);
> > > > +
> > > > +	/*
> > > > +	 * Verify a new VM didn't come along, acquire the lock, and increment
> > > > +	 * the count before this task acquired the lock.
> > > > +	 */
> > > > +	if (atomic_read(&srso_nr_vms))
> > > > +		return;
> > > > +
> > > > +	on_each_cpu(svm_srso_clear_bp_spec_reduce, NULL, 1);
> > > 
> > > Just a passing-by comment. I get worried about sending IPIs while
> > > holding a spinlock because if someone ever tries to hold that spinlock
> > > with IRQs disabled, it may cause a deadlock.
> > > 
> > > This is not the case for this lock, but it's not obvious (at least to
> > > me) that holding it in a different code path that doesn't send IPIs with
> > > IRQs disabled could cause a problem.
> > > 
> > > You could add a comment, convert it to a mutex to make this scenario
> > > impossible,
> > 
> > Using a mutex doesn't make deadlock impossible, it's still perfectly legal to
> > disable IRQs while holding a mutex.
> 
> Right, but it's illegal to hold a mutex while disabling IRQs.

Nit on the wording: it's illegal to take a mutex while IRQs are disabled.  Disabling
IRQs while already holding a mutex is fine.

And it's also illegal to take a spinlock while IRQs are disabled, becauase spinlocks
become sleepable mutexes with PREEMPT_RT=y.  While PREEMPT_RT=y isn't super common,
people do run KVM with PREEMPT_RT=y, and I'm guessing bots/CI would trip any such
violation quite quickly.

E.g. with IRQs disabled around the guard(spinlock)(&srso_lock):

 BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
 in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 2799, name: qemu
 preempt_count: 0, expected: 0
 RCU nest depth: 0, expected: 0
 1 lock held by qemu/2799:
  #0: ffffffff8263f898 (srso_lock){....}-{3:3}, at: svm_vm_destroy+0x47/0xa0
 irq event stamp: 9090
 hardirqs last  enabled at (9089): [<ffffffff81414087>] vprintk_store+0x467/0x4d0
 hardirqs last disabled at (9090): [<ffffffff812fd1ce>] svm_vm_destroy+0x5e/0xa0
 softirqs last  enabled at (0): [<ffffffff8137585c>] copy_process+0xa1c/0x29f0
 softirqs last disabled at (0): [<0000000000000000>] 0x0
 Call Trace:
  <TASK>
  dump_stack_lvl+0x57/0x80
  __might_resched.cold+0xcc/0xde
  rt_spin_lock+0x5b/0x170
  svm_vm_destroy+0x47/0xa0
  kvm_destroy_vm+0x180/0x310
  kvm_vm_release+0x1d/0x30
  __fput+0x10d/0x2f0
  task_work_run+0x58/0x90
  do_exit+0x325/0xa80
  do_group_exit+0x32/0xa0
  get_signal+0xb5b/0xbb0
  arch_do_signal_or_restart+0x29/0x230
  syscall_exit_to_user_mode+0xea/0x180
  do_syscall_64+0x7a/0x220
  entry_SYSCALL_64_after_hwframe+0x76/0x7e
 RIP: 0033:0x7fb50ae7fc4e
  </TASK>

> In this case, if the other CPU is already holding the lock then there's no
> risk of deadlock, right?

Not on srso_lock, but there's still deadlock potential on the locks used to protect
the call_function_data structure.

> > Similarly, I don't want to add a comment, because there is absolutely nothing
> > special/unique about this situation/lock.  E.g. KVM has tens of calls to
> > smp_call_function_many_cond() while holding a spinlock equivalent, in the form
> > of kvm_make_all_cpus_request() while holding mmu_lock.
> 
> Agreed that it's not a unique situation at all. Ideally we'd have some
> debugging (lockdep?) magic that identifies that an IPI is being sent
> while a lock is held, and that this specific lock is never spinned on
> with IRQs disabled.

Sleepable spinlocks aside, the lockdep_assert_irqs_enabled() in
smp_call_function_many_cond() already provides sufficient of coverage for that
case.  And if code is using some other form of IPI communication *and* taking raw
spinlocks, then I think it goes without saying that developers would need to be
very, very careful.

> > smp_call_function_many_cond() already asserts that IRQs are disabled, so I have
> > zero concerns about this flow breaking in the future.
> 
> That doesn't really help tho, the problem is if another CPU spins on the
> lock with IRQs disabled, regardless of whether or not it. Basically if
> CPU 1 acquires the lock and sends an IPI while CPU 2 disables IRQs and
> spins on the lock.

Given that svm_srso_vm_destroy() is guaranteed to call on_each_cpu() with the
lock held at some point, I'm completely comfortable relying on its lockdep
assertion.

> > > or dismiss my comment as being too paranoid/ridiculous :)
> > 
> > I wouldn't say your thought process is too paranoid; when writing the code, I had
> > to pause and think to remember whether or not using on_each_cpu() while holding a
> > spinlock is allowed.  But I do think the conclusion is wrong :-)
> 
> That's fair. I think protection against this should be done more generically
> as I mentioned earlier, but it felt like it would be easy-ish to side-step it
> in this case.

Eh, modifying this code in such a way that it could deadlock without lockdep
noticing would likely send up a comincal number of red flags during code review.