[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABgObfa4TKcj-d3Spw+TAE7ZfO8wFGJebkW3jMyFY2TrKxMuSw@mail.gmail.com>
Date: Fri, 24 Jan 2025 23:19:52 +0100
From: Paolo Bonzini <pbonzini@...hat.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: "Kernel Mailing List, Linux" <linux-kernel@...r.kernel.org>, kvm <kvm@...r.kernel.org>
Subject: Re: [PATCH 1/2] KVM: x86: fix usage of kvm_lock in set_nx_huge_pages()
Il ven 24 gen 2025, 21:11 Sean Christopherson <seanjc@...gle.com> ha scritto:
> Heh, except it's all kinds of broken.
Yes, I didn't even try.
> IMO, biting the bullet and converting to
> an SRCU-protected list is going to be far less work in the long run.
I did try a long SRCU critical section and it was unreviewable. It
ends up a lot less manageable than just making the lock a leaf,
especially as the lock hierarchy spans multiple subsystems (static
key, KVM, cpufreq---thanks CPU hotplug lock...). I also didn't like
adding a synchronization primitive that's... kinda single-use, but
that would not be a blocker of course.
So the second attempt was regular RCU, which looked a lot like this
patch. I started writing all the dances to find a struct kvm that
passes kvm_get_kvm_safe() before you do rcu_read_unlock() and drop the
previous one (because you cannot do kvm_put_kvm() within the RCU read
side) and set aside the idea, incorrectly thinking that they were not
needed with kvm_lock. Plus I didn't like having to keep alive a bunch
of data for a whole grace period if call_rcu() is used.
So for the third attempt I could have chosen between dropping the SRCU
or just using kvm_lock. I didn't even think of SRCU to be honest,
because everything so far looked so bad, but it does seem a little
better than RCU. At least, if kvm_destroy_vm() uses call_srcu(), you
can call kvm_put_kvm() within srcu_read_lock()...srcu_read_unlock().
It would look something like
list_for_each_entry_srcu(kvm, &vm_list, vm_list, 1) {
if (!kvm_get_kvm_safe(kvm))
continue;
/* kvm is protected by the reference count now. */
srcu_read_unlock(&kvm_srcu);
...
srcu_read_lock(&kvm_srcu);
/* kvm stays alive, and next can be read, until the next
srcu_read_unlock() */
kvm_put_kvm(kvm);
}
srcu_read_unlock(&kvm_srcu);
but again I am not sure how speedy call_srcu() is in reclaiming the
data, even in the common case where set_nx_huge_pages() or any other
RCU reader (none of them is frequent) isn't running. If you don't use
call_srcu() it becomes just as bad as RCU or kvm_lock.
So... let's talk about kvm_lock.
> > @@ -7143,16 +7141,19 @@ static int set_nx_huge_pages(const char *val, const struct kernel_param *kp)
> > + kvm_get_kvm(kvm);
>
> This needs to be:
>
> if (!kvm_get_kvm_safe(kvm))
> continue;
If we go for kvm_lock, kvm_get_kvm() *can* be made safe within the
critical section; if kvm_put_kvm() uses refcount_dec_and_mutex_lock(),
then the 1->0 transition happens under kvm_lock and cannot race with
kvm_get_kvm() (the mutex can be dropped as soon as
refcount_dec_and_mutex_lock() returns, it's really just the decrement
that needs to be within the critical section).
> > if (new_val != old_val) {
> > struct kvm *kvm;
> >
> > - mutex_lock(&kvm_lock);
> > -
> > list_for_each_entry(kvm, &vm_list, vm_list) {
>
> This is unsafe, as vm_list can be modified while kvm_lock is dropped. And
> using list_for_each_entry_safe() doesn't help, because the _next_ entry have been
> freed.
list_for_each_entry_safe() is broken, but list_for_each_entry() can be
used. The problem is the call to kvm_put_kvm(), which is where the kvm
struct is freed thus breaking the foreach loop. So next must be read
and ref'd _before_ kvm_put_kvm(), then you can do
kvm_get_kvm(kvm);
mutex_unlock(&kvm_lock);
if (prev)
kvm_put_kvm(prev);
...
mutex_lock(&kvm_lock);
prev = kvm;
I don't know... there are few-enough readers that SRCU seems a bit
misplaced and it has the issue of keeping the VM data alive; while
kvm_lock has uglier code with the kvm_put_kvm() looking really
misplaced. If there were many instances one could write a nice
iterator, but for just one use?
Hmm... I wonder if something like
if (poll_state_synchronize_srcu(&kvm_srcu,
get_state_synchronize_srcu(&kvm_srcu))) {
kvm_destroy_vm_cb(&kvm->rcu_head);
} else {
call_srcu(&kvm_srcu, &kvm->rcu_head, kvm_destroy_vm_cb);
}
catches the case where there's no concurrent reader. If so, SRCU would
be a winner undoubtedly, but being the only user of a tricky RCU API
doesn't give me warm and fuzzy feelings. I'm still team kvm_lock for
now.
Anyhow I can prepare a tested version next Monday, with either
kvm_lock or with SRCU if the above trick works. Unless I showed that
it's trickier than it seems and successfully nerd-sniped you.
Seriously - just tell me what you prefer.
Paolo
Powered by blists - more mailing lists