lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 29 Oct 2021 11:15:31 +1100
From:   David Gibson <david@...son.dropbear.id.au>
To:     Liu Yi L <yi.l.liu@...el.com>
Cc:     alex.williamson@...hat.com, jgg@...dia.com, hch@....de,
        jasowang@...hat.com, joro@...tes.org, jean-philippe@...aro.org,
        kevin.tian@...el.com, parav@...lanox.com, lkml@...ux.net,
        pbonzini@...hat.com, lushenming@...wei.com, eric.auger@...hat.com,
        corbet@....net, ashok.raj@...el.com, yi.l.liu@...ux.intel.com,
        jun.j.tian@...el.com, hao.wu@...el.com, dave.jiang@...el.com,
        jacob.jun.pan@...ux.intel.com, kwankhede@...dia.com,
        robin.murphy@....com, kvm@...r.kernel.org,
        iommu@...ts.linux-foundation.org, dwmw2@...radead.org,
        linux-kernel@...r.kernel.org, baolu.lu@...ux.intel.com,
        nicolinc@...dia.com
Subject: Re: [RFC 20/20] Doc: Add documentation for /dev/iommu

On Sun, Sep 19, 2021 at 02:38:48PM +0800, Liu Yi L wrote:
> Document the /dev/iommu framework for user.
> 
> Open:
> Do we want to document /dev/iommu in Documentation/userspace-api/iommu.rst?
> Existing iommu.rst is for the vSVA interfaces, honestly, may need to rewrite
> this doc entirely.
> 
> Signed-off-by: Kevin Tian <kevin.tian@...el.com>
> Signed-off-by: Liu Yi L <yi.l.liu@...el.com>
> ---
>  Documentation/userspace-api/index.rst   |   1 +
>  Documentation/userspace-api/iommufd.rst | 183 ++++++++++++++++++++++++
>  2 files changed, 184 insertions(+)
>  create mode 100644 Documentation/userspace-api/iommufd.rst
> 
> diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst
> index 0b5eefed027e..54df5a278023 100644
> --- a/Documentation/userspace-api/index.rst
> +++ b/Documentation/userspace-api/index.rst
> @@ -25,6 +25,7 @@ place where this information is gathered.
>     ebpf/index
>     ioctl/index
>     iommu
> +   iommufd
>     media/index
>     sysfs-platform_profile
>  
> diff --git a/Documentation/userspace-api/iommufd.rst b/Documentation/userspace-api/iommufd.rst
> new file mode 100644
> index 000000000000..abffbb47dc02
> --- /dev/null
> +++ b/Documentation/userspace-api/iommufd.rst
> @@ -0,0 +1,183 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +.. iommu:
> +
> +===================
> +IOMMU Userspace API
> +===================
> +
> +Direct device access from userspace has been a crtical feature in
> +high performance computing and virtualization usages. Linux now
> +includes multiple device-passthrough frameworks (e.g. VFIO and vDPA)
> +to manage secure device access from the userspace. One critical
> +task of those frameworks is to put the assigned device in a secure,
> +IOMMU-protected context so the device is prevented from doing harm
> +to the rest of the system.
> +
> +Currently those frameworks implement their own logic for managing
> +I/O page tables to isolate user-initiated DMAs. This doesn't scale
> +to support many new IOMMU features, such as PASID-granular DMA
> +remapping, nested translation, I/O page fault, IOMMU dirty bit, etc.
> +
> +The /dev/iommu framework provides an unified interface for managing
> +I/O page tables for passthrough devices. Existing passthrough
> +frameworks are expected to use this interface instead of continuing
> +their ad-hoc implementations.
> +
> +IOMMUFDs, IOASIDs, Devices and Groups
> +-------------------------------------
> +
> +The core concepts in /dev/iommu are IOMMUFDs and IOASIDs. IOMMUFD (by
> +opening /dev/iommu) is the container holding multiple I/O address
> +spaces for a user, while IOASID is the fd-local software handle
> +representing an I/O address space and associated with a single I/O
> +page table. User manages those address spaces through fd operations,
> +e.g. by using vfio type1v2 mapping semantics to manage respective
> +I/O page tables.
> +
> +IOASID is comparable to the conatiner concept in VFIO. The latter
> +is also associated to a single I/O address space. A main difference
> +between them is that multiple IOASIDs in the same IOMMUFD can be
> +nested together (not supported yet) to allow centralized accounting
> +of locked pages, while multiple containers are disconnected thus
> +duplicated accounting is incurred. Typically one IOMMUFD is
> +sufficient for all intended IOMMU usages for a user.
> +
> +An I/O address space takes effect in the IOMMU only after it is
> +attached by a device. One I/O address space can be attached by
> +multiple devices. One device can be only attached to a single I/O
> +address space at this point (on par with current vfio behavior).
> +
> +Device must be bound to an iommufd before the attach operation can
> +be conducted. The binding operation builds the connection between
> +the devicefd (opened via device-passthrough framework) and IOMMUFD.
> +IOMMU-protected security context is esbliashed when the binding
> +operation is completed.

This can't be quite right.  You can't establish a safe security
context until all devices in the groun are bound, but you can only
bind them one at a time.

>  The passthrough framework must block user
> +access to the assigned device until bind() returns success.
> +
> +The entire /dev/iommu framework adopts a device-centric model w/o
> +carrying any container/group legacy as current vfio does. However
> +the group is the minimum granularity that must be used to ensure
> +secure user access (refer to vfio.rst). This framework relies on
> +the IOMMU core layer to map device-centric model into group-granular
> +isolation.
> +
> +Managing I/O Address Spaces
> +---------------------------
> +
> +When creating an I/O address space (by allocating IOASID), the user
> +must specify the type of underlying I/O page table. Currently only
> +one type (kernel-managed) is supported. In the future other types
> +will be introduced, e.g. to support user-managed I/O page table or
> +a shared I/O page table which is managed by another kernel sub-
> +system (mm, ept, etc.). Kernel-managed I/O page table is currently
> +managed via vfio type1v2 equivalent mapping semantics.
> +
> +The user also needs to specify the format of the I/O page table
> +when allocating an IOASID.

This almost seems redundant with the previous paragraph.  I think
maybe it's making a distinction between "type" and "format", but I
don't think it's very clear what the distinction is.

> The format must be compatible to the
> +attached devices (or more specifically to the IOMMU which serves
> +the DMA from the attached devices). User can query the device IOMMU
> +format via IOMMUFD once a device is successfully bound. Attaching a
> +device to an IOASID with incompatible format is simply rejected.
> +
> +Currently no-snoop DMA is not supported yet. This implies that
> +IOASID must be created in an enforce-snoop format and only devices
> +which can be forced to snoop cache by IOMMU are allowed to be
> +attached to IOASID. The user should check uAPI extension and get
> +device info via IOMMUFD to handle such restriction.
> +
> +Usage Example
> +-------------
> +
> +Assume user wants to access PCI device 0000:06:0d.0, which is
> +exposed under the new /dev/vfio/devices directory by VFIO:
> +
> +	/* Open device-centric interface and /dev/iommu interface */
> +	device_fd = open("/dev/vfio/devices/0000:06:0d.0", O_RDWR);
> +	iommu_fd = open("/dev/iommu", O_RDWR);
> +
> +	/* Bind device to IOMMUFD */
> +	bind_data = { .iommu_fd = iommu_fd, .dev_cookie = cookie };
> +	ioctl(device_fd, VFIO_DEVICE_BIND_IOMMUFD, &bind_data);
> +
> +	/* Query per-device IOMMU capability/format */
> +	info = { .dev_cookie = cookie, };
> +	ioctl(iommu_fd, IOMMU_DEVICE_GET_INFO, &info);
> +
> +	if (!(info.flags & IOMMU_DEVICE_INFO_ENFORCE_SNOOP)) {
> +		if (!ioctl(iommu_fd, IOMMU_CHECK_EXTENSION,
> +				EXT_DMA_NO_SNOOP))
> +			/* No support of no-snoop DMA */
> +	}
> +
> +	if (!ioctl(iommu_fd, IOMMU_CHECK_EXTENSION, EXT_MAP_TYPE1V2))
> +		/* No support of vfio type1v2 mapping semantics */
> +
> +	/* Decides IOASID alloc fields based on info */
> +	alloc_data = { .type = IOMMU_IOASID_TYPE_KERNEL,
> +		       .flags = IOMMU_IOASID_ENFORCE_SNOOP,
> +		       .addr_width = info.addr_width, };
> +
> +	/* Allocate IOASID */
> +	gpa_ioasid = ioctl(iommu_fd, IOMMU_IOASID_ALLOC, &alloc_data);
> +
> +	/* Attach device to an IOASID */
> +	at_data = { .iommu_fd = iommu_fd; .ioasid = gpa_ioasid};
> +	ioctl(device_fd, VFIO_DEVICE_ATTACH_IOASID, &at_data);
> +
> +	/* Setup GPA mapping [0 - 1GB] */
> +	dma_map = {
> +		.ioasid	= gpa_ioasid,
> +		.data {
> +			.flags  = R/W		/* permission */
> +			.iova	= 0,		/* GPA */
> +			.vaddr	= 0x40000000,	/* HVA */
> +			.size	= 1GB,
> +		},
> +	};
> +	ioctl(iommu_fd, IOMMU_MAP_DMA, &dma_map);
> +
> +	/* DMA */
> +
> +	/* Unmap GPA mapping [0 - 1GB] */
> +	dma_unmap = {
> +		.ioasid	= gpa_ioasid,
> +		.data {
> +			.iova	= 0,		/* GPA */
> +			.size	= 1GB,
> +		},
> +	};
> +	ioctl(iommu_fd, IOMMU_UNMAP_DMA, &dma_unmap);
> +
> +	/* Detach device from an IOASID */
> +	dt_data = { .iommu_fd = iommu_fd; .ioasid = gpa_ioasid};
> +	ioctl(device_fd, VFIO_DEVICE_DETACH_IOASID, &dt_data);
> +
> +	/* Free IOASID */
> +	ioctl(iommu_fd, IOMMU_IOASID_FREE, gpa_ioasid);
> +
> +	close(device_fd);
> +	close(iommu_fd);
> +
> +API for device-passthrough frameworks
> +-------------------------------------
> +
> +iommufd binding and IOASID attach/detach are initiated via the device-
> +passthrough framework uAPI.
> +
> +When a binding operation is requested by the user, the passthrough
> +framework should call iommufd_bind_device(). When the device fd is
> +closed by the user, iommufd_unbind_device() should be called
> +automatically::
> +
> +	struct iommufd_device *
> +	iommufd_bind_device(int fd, struct device *dev,
> +			   u64 dev_cookie);
> +	void iommufd_unbind_device(struct iommufd_device *idev);
> +
> +IOASID attach/detach operations are per iommufd_device which is
> +returned by iommufd_bind_device():
> +
> +	int iommufd_device_attach_ioasid(struct iommufd_device *idev,
> +					int ioasid);
> +	void iommufd_device_detach_ioasid(struct iommufd_device *idev,
> +					int ioasid);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ