[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y8FvYhpIRIiX22Op@nvidia.com>
Date: Fri, 13 Jan 2023 10:49:06 -0400
From: Jason Gunthorpe <jgg@...dia.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Alex Williamson <alex.williamson@...hat.com>,
Matthew Rosato <mjrosato@...ux.ibm.com>, pbonzini@...hat.com,
cohuck@...hat.com, farman@...ux.ibm.com, pmorel@...ux.ibm.com,
borntraeger@...ux.ibm.com, frankja@...ux.ibm.com,
imbrenda@...ux.ibm.com, david@...hat.com, akrowiak@...ux.ibm.com,
jjherne@...ux.ibm.com, pasic@...ux.ibm.com,
zhenyuw@...ux.intel.com, zhi.a.wang@...el.com,
linux-s390@...r.kernel.org, kvm@...r.kernel.org,
intel-gvt-dev@...ts.freedesktop.org,
intel-gfx@...ts.freedesktop.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] vfio: fix potential deadlock on vfio group lock
On Fri, Jan 13, 2023 at 02:05:14AM +0000, Sean Christopherson wrote:
> On Thu, Jan 12, 2023, Alex Williamson wrote:
> > On Thu, 12 Jan 2023 23:29:53 +0000
> > Sean Christopherson <seanjc@...gle.com> wrote:
> >
> > > On Thu, Jan 12, 2023, Matthew Rosato wrote:
> > > > On 1/12/23 4:05 PM, Alex Williamson wrote:
> > > > > On Thu, 12 Jan 2023 15:38:44 -0500
> > > > > Matthew Rosato <mjrosato@...ux.ibm.com> wrote:
> > > > >> @@ -344,6 +345,35 @@ static bool vfio_assert_device_open(struct vfio_device *device)
> > > > >> return !WARN_ON_ONCE(!READ_ONCE(device->open_count));
> > > > >> }
> > > > >>
> > > > >> +static bool vfio_kvm_get_kvm_safe(struct kvm *kvm)
> > > > >> +{
> > > > >> + bool (*fn)(struct kvm *kvm);
> > > > >> + bool ret;
> > > > >> +
> > > > >> + fn = symbol_get(kvm_get_kvm_safe);
> > > > >> + if (WARN_ON(!fn))
> > >
> > > In a related vein to Alex's comments about error handling, this should not WARN.
> > > WARNing during vfio_kvm_put_kvm() makes sense, but the "get" is somewhat blind.
> >
> > It's not exactly blind though, we wouldn't have a kvm pointer if the
> > kvm-vfio device hadn't stuffed one into the group. We only call into
> > here if we have a non-NULL pointer, so it wouldn't simply be that the
> > kvm module isn't available for this to fire, but more that we have an
> > API change to make the symbol no longer exist. A WARN for that doesn't
> > seem unreasonable. Thanks,
>
> Hmm, I was thinking that it might be possible for kvm.ko to be on its way out,
> but barring use of force module unload, which breaks things left and right, kvm.ko
> can only be going if all VMs have been destroyed.
If we really care about these details then we should obtain both the
get_safe and put together, the put pointer should be stored in the
device and it should be symbol_put'd back once the kvm is put back and
it isn't needed any more.
This properly mimics how normal module stacking would work
Jason
Powered by blists - more mailing lists