lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 2 Apr 2021 12:46:55 +0000
From:   "Liu, Yi L" <yi.l.liu@...el.com>
To:     Jason Gunthorpe <jgg@...dia.com>
CC:     "Tian, Kevin" <kevin.tian@...el.com>,
        Jacob Pan <jacob.jun.pan@...ux.intel.com>,
        Jean-Philippe Brucker <jean-philippe@...aro.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Joerg Roedel <joro@...tes.org>,
        Lu Baolu <baolu.lu@...ux.intel.com>,
        David Woodhouse <dwmw2@...radead.org>,
        "iommu@...ts.linux-foundation.org" <iommu@...ts.linux-foundation.org>,
        "cgroups@...r.kernel.org" <cgroups@...r.kernel.org>,
        Tejun Heo <tj@...nel.org>, Li Zefan <lizefan@...wei.com>,
        Johannes Weiner <hannes@...xchg.org>,
        "Jean-Philippe Brucker" <jean-philippe@...aro.com>,
        Alex Williamson <alex.williamson@...hat.com>,
        Eric Auger <eric.auger@...hat.com>,
        "Jonathan Corbet" <corbet@....net>,
        "Raj, Ashok" <ashok.raj@...el.com>, "Wu, Hao" <hao.wu@...el.com>,
        "Jiang, Dave" <dave.jiang@...el.com>
Subject: RE: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation
 APIs

Hi Jason,

> From: Jason Gunthorpe <jgg@...dia.com>
> Sent: Thursday, April 1, 2021 7:54 PM
> 
> On Thu, Apr 01, 2021 at 07:04:01AM +0000, Liu, Yi L wrote:
> 
> > After reading your reply in https://lore.kernel.org/linux-
> iommu/20210331123801.GD1463678@...dia.com/#t
> > So you mean /dev/ioasid FD is per-VM instead of per-ioasid, so above
> skeleton
> > doesn't suit your idea.
> 
> You can do it one PASID per FD or multiple PASID's per FD. Most likely
> we will have high numbers of PASID's in a qemu process so I assume
> that number of FDs will start to be a contraining factor, thus
> multiplexing is reasonable.
> 
> It doesn't really change anything about the basic flow.
> 
> digging deeply into it either seems like a reasonable choice.
> 
> > +-----------------------------+-----------------------------------------------+
> > |      userspace              |               kernel space                    |
> > +-----------------------------+-----------------------------------------------+
> > | ioasid_fd =                 | /dev/ioasid does below:                       |
> > | open("/dev/ioasid", O_RDWR);|   struct ioasid_fd_ctx {                      |
> > |                             |       struct list_head ioasid_list;           |
> > |                             |       ...                                     |
> > |                             |   } ifd_ctx; // ifd_ctx is per ioasid_fd      |
> 
> Sure, possibly an xarray not a list
> 
> > +-----------------------------+-----------------------------------------------+
> > | ioctl(ioasid_fd,            | /dev/ioasid does below:                       |
> > |       ALLOC, &ioasid);      |   struct ioasid_data {                        |
> > |                             |       ioasid_t ioasid;                        |
> > |                             |       struct list_head device_list;           |
> > |                             |       struct list_head next;                  |
> > |                             |       ...                                     |
> > |                             |   } id_data; // id_data is per ioasid         |
> > |                             |                                               |
> > |                             |   list_add(&id_data.next,                     |
> > |                             |            &ifd_ctx.ioasid_list);
> > |
> 
> Yes, this should have a kref in it too
> 
> > +-----------------------------+-----------------------------------------------+
> > | ioctl(device_fd,            | VFIO does below:                              |
> > |       DEVICE_ALLOW_IOASID,  | 1) get ioasid_fd, check if ioasid_fd is valid |
> > |       ioasid_fd,            | 2) check if ioasid is allocated from ioasid_fd|
> > |       ioasid);              | 3) register device/domain info to /dev/ioasid |
> > |                             |    tracked in id_data.device_list             |
> > |                             | 4) record the ioasid in VFIO's per-device     |
> > |                             |    ioasid list for future security check      |
> 
> You would provide a function that does steps 1&2 look at eventfd for
> instance.
> 
> I'm not sure we need to register the device with the ioasid. device
> should incr the kref on the ioasid_data at this point.
> 
> > +-----------------------------+-----------------------------------------------+
> > | ioctl(ioasid_fd,            | /dev/ioasid does below:                       |
> > |       BIND_PGTBL,           | 1) find ioasid's id_data                      |
> > |       pgtbl_data,           | 2) loop the id_data.device_list and tell iommu|
> > |       ioasid);              |    give ioasid access to the devices
> > |
> 
> This seems backwards, DEVICE_ALLOW_IOASID should tell the iommu to
> give the ioasid to the device.
> 
> Here the ioctl should be about assigning a memory map from the the
> current
> mm_struct to the pasid
> 
> > +-----------------------------+-----------------------------------------------+
> > | ioctl(ioasid_fd,            | /dev/ioasid does below:                       |
> > |       UNBIND_PGTBL,         | 1) find ioasid's id_data                      |
> > |       ioasid);              | 2) loop the id_data.device_list and tell iommu|
> > |                             |    clear ioasid access to the devices         |
> 
> Also seems backwards. The ioctl here should be 'destroy ioasid' which
> wipes out the page table, halts DMA access and parks the PASID until
> all users are done.
> 
> > +-----------------------------+-----------------------------------------------+
> > | ioctl(device_fd,            | VFIO does below:                              |
> > |      DEVICE_DISALLOW_IOASID,| 1) check if ioasid is associated in VFIO's    |
> > |       ioasid_fd,            |    device ioasid list.                        |
> > |       ioasid);              | 2) unregister device/domain info from         |
> > |                             |    /dev/ioasid, clear in id_data.device_list  |
> 
> This should disconnect the iommu and kref_put the ioasid_data

thanks for the comments, updated the skeleton a little bit, accepted your Xarray
and kref suggestion.

+-----------------------------+------------------------------------------------+
|      userspace              |               kernel space                     |
+-----------------------------+------------------------------------------------+
| ioasid_fd =                 | /dev/ioasid does below:                        |
| open("/dev/ioasid", O_RDWR);|   struct ioasid_fd_ctx {                       |
|                             |        struct xarray xa;                       |
|                             |       ...                                      |
|                             |   } ifd_ctx; // ifd_ctx is per ioasid_fd       |
+-----------------------------+------------------------------------------------+
| ioctl(ioasid_fd,            | /dev/ioasid does below:                        |
|       ALLOC, &ioasid);      |   struct ioasid_data {                         |
|                             |       ioasid_t ioasid;                         |
|                             |       refcount_t refs;                         |
|                             |       ...                                      |
|                             |   } id_data; // id_data is per ioasid          |
|                             |                                                |
|                             |   refcount_set(&id_data->refs, 1);             |
+-----------------------------+------------------------------------------------+
| ioctl(device_fd,            | VFIO does below:                               |
|       DEVICE_ALLOW_IOASID,  | 1) get ioasid_fd, check if ioasid_fd is valid  |
|       ioasid_fd,            | 2) check if ioasid is allocated from ioasid_fd |
|       ioasid);              | 3) inr refcount on the ioasid                  |
|                             | 4) tell iommu to give the ioasid to the device |
|                             |    by an iommu API. iommu driver needs to      |
|                             |    store the ioasid/device info in a per       |
|                             |    ioasid allow device list                    |
|                             | 5) record the ioasid in VFIO's per-device      |
|                             |    ioasid list for future security check       |
+-----------------------------+------------------------------------------------+
| ioctl(ioasid_fd,            | /dev/ioasid does below:                        |
|       BIND_PGTBL,           | 1) find ioasid's id_data                       |
|       pgtbl_data,           | 2) call into iommu driver with ioasid, pgtbl   |
|       ioasid);              |    data, iommu driver setup the PASID entry[1] |
|                             |    with the ioasid and the pgtbl_data          |
+-----------------------------+------------------------------------------------+
| ioctl(ioasid_fd,            | /dev/ioasid does below:                        |
|       CAHCE_INVLD,          | 1) find ioasid's id_data                       |
|       inv_data,             | 2) call into iommu driver with ioasid, inv     |
|       ioasid);              |    data, iommu driver invalidates cache        |
+-----------------------------+------------------------------------------------+
| ioctl(ioasid_fd,            | /dev/ioasid does below:                        |
|       UNBIND_PGTBL,         | 1) find ioasid's id_data                       |
|       ioasid);              | 2) call into iommu driver with ioasid, iommu   |
|                             |    driver destroy the PASID entry to block DMA |
|                             |    with this ioasid from device                |
+-----------------------------+------------------------------------------------+
| ioctl(device_fd,            | VFIO does below:                               |
|      DEVICE_DISALLOW_IOASID,| 1) check if ioasid is associated in VFIO's     |
|       ioasid_fd,            |    device ioasid list                          |
|       ioasid);              | 2) tell iommu driver to clear the device from  |
|                             |    its per-ioasid device allow list            |
|                             | 3) put refcount on the ioasid                  |
+-----------------------------+------------------------------------------------+
| ioctl(ioasid_fd,            | /dev/ioasid does below:                        |
|       FREE, ioasid);        |  list_del(&id_data.next);                      |
+-----------------------------+------------------------------------------------+

[1] PASID entry is an entry in a per-device PASID table, this is where the
    page table pointer is stored. e.g. guest cr3 page table pointer. Setup
    PASID entry in a device's PASID table means the access is finally grant
    in IOMMU side.

I kept FREE as it seems to be more symmetric since there is an ALLOC
exposed to userspace. But yeah, I'm open with removing it all the same
if it's really unnecessary per your opinion.

Need your help again on an open.
The major purpose of this series is to support vSVA for guest based on
nested translation. And there is another usage case which is also based
on nested translation but don't have an ioasid. And still, it needs the
bind/unbind_pgtbl, cache_invalidation uAPI. It is gIOVA support. In this
usage, page table is a guest IOVA page table, VMM needs to bind this page
table to host and enabled nested translation, also needs to do cache
invalidation when guest IOVA page table has changes. It's very similar
with the page table bind of vSVA. Only difference is there is no ioasid
in the gIOVA case. Instead, gIOVA case requires device information. But
with regards to the uAPI reusing, need to fit gIOVA to /dev/ioasid model.
As of now, I think it may require user space passes a device FD to the
BIND/UNBIND_PGTBL and CAHCE_INVLD ioctl, then iommu driver can bind the
gIOVA page table to a correct device. Not sure if it looks good. Do you
have any suggestion on it?

[...]
> Include a sequence showing how the kvm FD is used to program the
> vPASID to pPASID table that ENQCMD uses.
>
> Show how dynamic authorization works based on requests from the
> guest's vIOMMU

I'd like to see if the updated skeleton suits your idea first, then
draw a more complete flow to show this.

Regards,
Yi Liu

> Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ