[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20210305003642.GR4247@nvidia.com>
Date: Thu, 4 Mar 2021 20:36:42 -0400
From: Jason Gunthorpe <jgg@...dia.com>
To: Alex Williamson <alex.williamson@...hat.com>
CC: <cohuck@...hat.com>, <kvm@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <peterx@...hat.com>
Subject: Re: [RFC PATCH 05/10] vfio: Create a vfio_device from vma lookup
On Thu, Mar 04, 2021 at 05:07:31PM -0700, Alex Williamson wrote:
> On Thu, 4 Mar 2021 19:16:33 -0400
> Jason Gunthorpe <jgg@...dia.com> wrote:
>
> > On Thu, Mar 04, 2021 at 02:37:57PM -0700, Alex Williamson wrote:
> >
> > > Therefore unless a bus driver opts-out by replacing vm_private_data, we
> > > can identify participating vmas by the vm_ops and have flags indicating
> > > if the vma maps device memory such that vfio_get_device_from_vma()
> > > should produce a device reference. The vfio IOMMU backends would also
> > > consume this, ie. if they get a valid vfio_device from the vma, use the
> > > pfn_base field directly. vfio_vm_ops would wrap the bus driver
> > > callbacks and provide reference counting on open/close to release this
> > > object.
> >
> > > I'm not thrilled with a vfio_device_ops callback plumbed through
> > > vfio-core to do vma-to-pfn translation, so I thought this might be a
> > > better alternative. Thanks,
> >
> > Maybe you could explain why, because I'm looking at this idea and
> > thinking it looks very complicated compared to a simple driver op
> > callback?
>
> vfio-core needs to export a vfio_vma_to_pfn() which I think assumes the
> caller has already used vfio_device_get_from_vma(), but should still
> validate the vma is one from a vfio device before calling this new
> vfio_device_ops callback.
Huh? Validate? Why?
Something like this in the IOMMU stuff:
struct vfio_device *vfio = vfio_device_get_from_vma(vma)
if (!vfio->vma_to_pfn)
return -EINVAL;
vfio->ops->vma_to_pfn(vfio, vma, offset_from_vma_start)
Is fine, why do we need to over complicate?
I don't need to check that the vma belongs to the vfio because it is
the API contract that the caller will guarantee that.
This is the kernel, I can (and do!) check these sorts of things by
code inspection when working on stuff - I can look at every
implementation and every call site to prove these things.
IMHO doing an expensive check like that is a style of defensive
programming the kernel community frowns upon.
> vfio-pci needs to validate the vm_pgoff value falls within a BAR
> region, mask off the index and get the pci_resource_start() for the
> BAR index.
It needs to do the same thing fault() already does, which is currently
one line..
> Then we need a solution for how vfio_device_get_from_vma() determines
> whether to grant a device reference for a given vma, where that vma may
> map something other than device memory. Are you imagining that we hand
> out device references independently and vfio_vma_to_pfn() would return
> an errno for vm_pgoff values that don't map device memory and the IOMMU
> driver would release the reference?
That seems a reasonable place to start
> prevent using unmmap_mapping_range(). The IOMMU backend, once it has a
> vfio_device via vfio_device_get_from_vma() can know the format of
> vm_private_data, cast it as a vfio_vma_private_data and directly use
> base_pfn, accomplishing the big point. They're all operating in the
> agreed upon vm_private_data format. Thanks,
If we force all drivers into a mandatory (!) vm_private_data format
then every driver has to be audited and updated before the new pfn
code can be done. If any driver in the future makes a mistake here
(and omitting the unique vm_private_data magics is a very easy mistake
to make) then it will cause a kernel crash in an obscure scenario.
It is the "design the API to be hard to use wrong" philosophy.
Jason
Powered by blists - more mailing lists