lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zn02BUdJ7kvOg6Vw@yzhao56-desk.sh.intel.com>
Date: Thu, 27 Jun 2024 17:51:01 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: "Tian, Kevin" <kevin.tian@...el.com>
CC: Jason Gunthorpe <jgg@...dia.com>, "kvm@...r.kernel.org"
	<kvm@...r.kernel.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "alex.williamson@...hat.com"
	<alex.williamson@...hat.com>, "peterx@...hat.com" <peterx@...hat.com>,
	"ajones@...tanamicro.com" <ajones@...tanamicro.com>
Subject: Re: [PATCH] vfio: Reuse file f_inode as vfio device inode

On Thu, Jun 27, 2024 at 08:17:02AM +0800, Tian, Kevin wrote:
> > From: Tian, Kevin
> > Sent: Thursday, June 27, 2024 7:56 AM
> > 
> > > From: Jason Gunthorpe <jgg@...dia.com>
> > > Sent: Wednesday, June 26, 2024 9:35 PM
> > >
> > > On Mon, Jun 17, 2024 at 05:53:32PM +0800, Yan Zhao wrote:
> > > > Reuse file f_inode as vfio device inode and associate pseudo path file
> > > > directly to inode allocated in vfio fs.
> > > >
> > > > Currently, vfio device is opened via 2 ways:
> > > > 1) via cdev open
> > > >    vfio device is opened with a cdev device with file f_inode and address
> > > >    space associated with a cdev inode;
> > > > 2) via VFIO_GROUP_GET_DEVICE_FD ioctl
> > > >    vfio device is opened via a pseudo path file with file f_inode and
> > > >    address space associated with an inode in anon_inode_fs.
> > > >
> > > > In commit b7c5e64fecfa ("vfio: Create vfio_fs_type with inode per
> > device"),
> > > > an inode in vfio fs is allocated for each vfio device. However, this inode
> > > > in vfio fs is only used to assign its address space to that of a file
> > > > associated with another cdev inode or an inode in anon_inode_fs.
> > > >
> > > > This patch
> > > > - reuses cdev device inode as the vfio device inode when it's opened via
> > > >   cdev way;
> > > > - allocates an inode in vfio fs, associate it to the pseudo path file,
> > > >   and save it as the vfio device inode when the vfio device is opened via
> > > >   VFIO_GROUP_GET_DEVICE_FD ioctl.
> > > >
> > > > File address space will then point automatically to the address space of
> > > > the vfio device inode. Tools like unmap_mapping_range() can then zap all
> > > > vmas associated with the vfio device.
> > > >
> > > > Signed-off-by: Yan Zhao <yan.y.zhao@...el.com>
> > > > ---
> > > >  drivers/vfio/device_cdev.c |  9 ++++---
> > > >  drivers/vfio/group.c       | 21 ++--------------
> > > >  drivers/vfio/vfio.h        |  2 ++
> > > >  drivers/vfio/vfio_main.c   | 49 +++++++++++++++++++++++++++-----------
> > > >  4 files changed, 43 insertions(+), 38 deletions(-)
> > > >
> > > > diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
> > > > index bb1817bd4ff3..a4eec8e88f5c 100644
> > > > --- a/drivers/vfio/device_cdev.c
> > > > +++ b/drivers/vfio/device_cdev.c
> > > > @@ -40,12 +40,11 @@ int vfio_device_fops_cdev_open(struct inode
> > > *inode, struct file *filep)
> > > >  	filep->private_data = df;
> > > >
> > > >  	/*
> > > > -	 * Use the pseudo fs inode on the device to link all mmaps
> > > > -	 * to the same address space, allowing us to unmap all vmas
> > > > -	 * associated to this device using unmap_mapping_range().
> > > > +	 * mmaps are linked to the address space of the inode of device cdev.
> > > > +	 * Save the inode of device cdev in device->inode to allow
> > > > +	 * unmap_mapping_range() to unmap all vmas.
> > > >  	 */
> > > > -	filep->f_mapping = device->inode->i_mapping;
> > > > -
> > > > +	device->inode = inode;
> > >
> > > This doesn't seem right.. There is only one device but multiple file
> > > can be opened on that device.
Maybe we can put this assignment to vfio_df_ioctl_bind_iommufd() after
vfio_df_open() makes sure device->open_count is 1.

e.g.
-long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df,
+long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df, struct inode *inode,
                                struct vfio_device_bind_iommufd __user *arg)
 {
        struct vfio_device *device = df->device;
@@ -118,6 +111,8 @@ long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df,
                goto out_close_device;

        device->cdev_opened = true;
+
+       device->inode = inode;
        /*
         * Paired with smp_load_acquire() in vfio_device_fops::ioctl/

 @@ -1266,7 +1296,7 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
        int ret;

        if (cmd == VFIO_DEVICE_BIND_IOMMUFD)
-               return vfio_df_ioctl_bind_iommufd(df, uptr);
+               return vfio_df_ioctl_bind_iommufd(df, filep->f_inode, uptr);


> > >
> > > We expect every open file to have a unique inode otherwise the
> > > unmap_mapping_range() will not function properly.
Even with commit b7c5e64fecfa ("vfio: Create vfio_fs_type with inode per
device"), in group path,
vfio_device_open_file() calls anon_inode_getfile(), from which the inode
returned is always anon_inode_inode, which is no unique even across
different devices.

> 
> Can you elaborate the reason of this expectation? If multiple open's
> come from a same process then having them share a single address
> space per device still makes sense. Are you considering a scenario
> where a vfio device is opened by multiple processes? is it allowed?
> 
> btw Yan's patch appears to impose different behaviors between cdev
> and group paths. For cdev all open files share the address space of
> the cdev inode, same effect as sharing that of the vfio-fs inode today.
> But for group open every open file will get a new inode which kind of
> matches your expectation but the patch simply overrides
> vfio_device->inode instead of tracking a list. That sound incomplete.
Yes, looks it's incomplete for group path.
What about still using inode in vfio fs as below in group path?
(I'll post the complete code which have passed my local tests if you think the
direction is right).

BTW, in group path, what's the benefit of allowing multiple open of device?

struct file *vfio_device_get_pseudo_file(struct vfio_device *device)             
{                                                                                
        const struct file_operations *fops = &vfio_device_fops;                  
        struct inode *inode;                                                     
        struct file *filep;                                                      
        int ret;                                                                 
                                                                                 
        if (!fops_get(fops))                                                     
                return ERR_PTR(-ENODEV);                                         
                                                                                 
        mutex_lock(&device->dev_set->lock);                                      
        if (device->open_count == 1) {                                           
                ret = simple_pin_fs(&vfio_fs_type, &vfio.vfs_mount, &vfio.fs_count);
                if (ret)                                                         
                        goto err_pin_fs;                                         
                                                                                 
                inode = alloc_anon_inode(vfio.vfs_mount->mnt_sb);                
                if (IS_ERR(inode)) {                                             
                        ret = PTR_ERR(inode);                                    
                        goto err_inode;                                          
                }                                                                
                device->inode = inode;                                           
        } else {                                                                 
                inode = device->inode;                                           
                ihold(inode);                                                    
        }                                                                        
                                                                                 
        filep = alloc_file_pseudo(inode, vfio.vfs_mount, "[vfio-device]",        
                                  O_RDWR, fops);                                 
                                                                                 
        if (IS_ERR(filep)) {                                                     
                ret = PTR_ERR(filep);                                            
                goto err_file;                                                   
        }                                                                        
        mutex_unlock(&device->dev_set->lock);                                    
        return filep;                                                            
                                                                                 
err_file:                                                                        
        iput(inode);                                                             
err_inode:                                                                       
        if (device->open_count == 1)                                             
                simple_release_fs(&vfio.vfs_mount, &vfio.fs_count);              
err_pin_fs:                                                                      
        mutex_unlock(&device->dev_set->lock);                                    
        fops_put(fops);                                                          
                                                                                 
        return ERR_PTR(ret);                                                     
}              



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ