[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210304143757.1ca42cfc@omen.home.shazbot.org>
Date: Thu, 4 Mar 2021 14:37:57 -0700
From: Alex Williamson <alex.williamson@...hat.com>
To: Jason Gunthorpe <jgg@...dia.com>
Cc: <cohuck@...hat.com>, <kvm@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <peterx@...hat.com>
Subject: Re: [RFC PATCH 05/10] vfio: Create a vfio_device from vma lookup
On Thu, 25 Feb 2021 19:49:49 -0400
Jason Gunthorpe <jgg@...dia.com> wrote:
> On Thu, Feb 25, 2021 at 03:21:13PM -0700, Alex Williamson wrote:
>
> > This is where it gets tricky. The vm_pgoff we get from
> > file_operations.mmap is already essentially describing an offset from
> > the base of a specific resource. We could convert that from an absolute
> > offset to a pfn offset, but it's only the bus driver code (ex.
> > vfio-pci) that knows how to get the base, assuming there is a single
> > base per region (we can't assume enough bits per region to store
> > absolute pfn). Also note that you're suggesting that all vfio mmaps
> > would need to standardize on the vfio-pci implementation of region
> > layouts. Not that most drivers haven't copied vfio-pci, but we've
> > specifically avoided exposing it as a fixed uAPI such that we could have
> > the flexibility for a bus driver to implement regions offsets however
> > they need.
>
> Okay, well the bus driver owns the address space and the bus driver is
> in control of the vm_pgoff. If it doesn't want to zap then it doesn't
> need to do anything
>
> vfio-pci can consistently use the index encoding and be fine
>
> > So I'm not really sure what this looks like. Within vfio-pci we could
> > keep the index bits in place to allow unmmap_mapping_range() to
> > selectively zap matching vm_pgoffs but expanding that to a vfio
> > standard such that the IOMMU backend can also extract a pfn looks very
> > limiting, or ugly. Thanks,
>
> Lets add a op to convert a vma into a PFN range. The map code will
> pass the vma to the op and get back a pfn (or failure).
>
> pci will switch the vm_pgoff to an index, find the bar base and
> compute the pfn.
>
> It is starting to look more and more like dma buf though
How terrible would it be if vfio-core used a shared vm_private_data
structure and inserted itself into the vm_ops call chain to reference
count that structure? We already wrap the device file descriptor via
vfio_device_fops.mmap, so we could:
struct vfio_vma_private_data *priv;
priv = kzalloc(...
priv->device = device;
vma->vm_private_data = priv;
device->ops->mmap(device->device_data, vma);
if (vma->vm_private_data == priv) {
priv->vm_ops = vma->vm_ops;
vma->vm_ops = &vfio_vm_ops;
} else
kfree(priv);
Where:
struct vfio_vma_private_data {
struct vfio_device *device;
unsigned long pfn_base;
void *private_data; // maybe not needed
const struct vm_operations_struct *vm_ops;
struct kref kref;
unsigned int release_notification:1;
};
Therefore unless a bus driver opts-out by replacing vm_private_data, we
can identify participating vmas by the vm_ops and have flags indicating
if the vma maps device memory such that vfio_get_device_from_vma()
should produce a device reference. The vfio IOMMU backends would also
consume this, ie. if they get a valid vfio_device from the vma, use the
pfn_base field directly. vfio_vm_ops would wrap the bus driver
callbacks and provide reference counting on open/close to release this
object.
I'm not thrilled with a vfio_device_ops callback plumbed through
vfio-core to do vma-to-pfn translation, so I thought this might be a
better alternative. Thanks,
Alex
Powered by blists - more mailing lists