linux-kernel - Re: [RFC PATCH 05/10] vfio: Create a vfio

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210304143757.1ca42cfc@omen.home.shazbot.org>
Date:   Thu, 4 Mar 2021 14:37:57 -0700
From:   Alex Williamson <alex.williamson@...hat.com>
To:     Jason Gunthorpe <jgg@...dia.com>
Cc:     <cohuck@...hat.com>, <kvm@...r.kernel.org>,
        <linux-kernel@...r.kernel.org>, <peterx@...hat.com>
Subject: Re: [RFC PATCH 05/10] vfio: Create a vfio_device from vma lookup

On Thu, 25 Feb 2021 19:49:49 -0400
Jason Gunthorpe <jgg@...dia.com> wrote:

> On Thu, Feb 25, 2021 at 03:21:13PM -0700, Alex Williamson wrote:
> 
> > This is where it gets tricky.  The vm_pgoff we get from
> > file_operations.mmap is already essentially describing an offset from
> > the base of a specific resource.  We could convert that from an absolute
> > offset to a pfn offset, but it's only the bus driver code (ex.
> > vfio-pci) that knows how to get the base, assuming there is a single
> > base per region (we can't assume enough bits per region to store
> > absolute pfn).  Also note that you're suggesting that all vfio mmaps
> > would need to standardize on the vfio-pci implementation of region
> > layouts.  Not that most drivers haven't copied vfio-pci, but we've
> > specifically avoided exposing it as a fixed uAPI such that we could have
> > the flexibility for a bus driver to implement regions offsets however
> > they need.  
> 
> Okay, well the bus driver owns the address space and the bus driver is
> in control of the vm_pgoff. If it doesn't want to zap then it doesn't
> need to do anything
> 
> vfio-pci can consistently use the index encoding and be fine
>  
> > So I'm not really sure what this looks like.  Within vfio-pci we could
> > keep the index bits in place to allow unmmap_mapping_range() to
> > selectively zap matching vm_pgoffs but expanding that to a vfio
> > standard such that the IOMMU backend can also extract a pfn looks very
> > limiting, or ugly.  Thanks,  
> 
> Lets add a op to convert a vma into a PFN range. The map code will
> pass the vma to the op and get back a pfn (or failure).
> 
> pci will switch the vm_pgoff to an index, find the bar base and
> compute the pfn.
> 
> It is starting to look more and more like dma buf though

How terrible would it be if vfio-core used a shared vm_private_data
structure and inserted itself into the vm_ops call chain to reference
count that structure?  We already wrap the device file descriptor via
vfio_device_fops.mmap, so we could:

	struct vfio_vma_private_data *priv;

	priv = kzalloc(...
	
	priv->device = device;
	vma->vm_private_data = priv;

	device->ops->mmap(device->device_data, vma);

	if (vma->vm_private_data == priv) {
		priv->vm_ops = vma->vm_ops;
		vma->vm_ops = &vfio_vm_ops;
	} else
		kfree(priv);

Where:

struct vfio_vma_private_data {
	struct vfio_device *device;
	unsigned long pfn_base;
	void *private_data; // maybe not needed
	const struct vm_operations_struct *vm_ops;
	struct kref kref;
	unsigned int release_notification:1;
};

Therefore unless a bus driver opts-out by replacing vm_private_data, we
can identify participating vmas by the vm_ops and have flags indicating
if the vma maps device memory such that vfio_get_device_from_vma()
should produce a device reference.  The vfio IOMMU backends would also
consume this, ie. if they get a valid vfio_device from the vma, use the
pfn_base field directly.  vfio_vm_ops would wrap the bus driver
callbacks and provide reference counting on open/close to release this
object.

I'm not thrilled with a vfio_device_ops callback plumbed through
vfio-core to do vma-to-pfn translation, so I thought this might be a
better alternative.  Thanks,

Alex