lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20151001104505-mutt-send-email-mst@redhat.com>
Date:	Thu, 1 Oct 2015 11:33:06 +0300
From:	"Michael S. Tsirkin" <mst@...hat.com>
To:	Stephen Hemminger <stephen@...workplumber.org>
Cc:	hjk@...sjkoch.de, gregkh@...ux-foundation.org,
	linux-kernel@...r.kernel.org, dev@...k.org
Subject: Re: [PATCH 2/2] uio: new driver to support PCI MSI-X

On Wed, Sep 30, 2015 at 03:28:58PM -0700, Stephen Hemminger wrote:
> This driver allows using PCI device with Message Signalled Interrupt
> from userspace. The API is similar to the igb_uio driver used by the DPDK.
> Via ioctl it provides a mechanism to map MSI-X interrupts into event
> file descriptors similar to VFIO.
>
> VFIO is a better choice if IOMMU is available, but often userspace drivers
> have to work in environments where IOMMU support (real or emulated) is
> not available.  All UIO drivers that support DMA are not secure against
> rogue userspace applications programming DMA hardware to access
> private memory; this driver is no less secure than existing code.
> 
> Signed-off-by: Stephen Hemminger <stephen@...workplumber.org>

I don't think copying the igb_uio interface is a good idea.
What DPDK is doing with igb_uio (and indeed uio_pci_generic)
is abusing the sysfs BAR access to provide unlimited
access to hardware.

MSI messages are memory writes so any generic device capable
of MSI is capable of corrupting kernel memory.
This means that a bug in userspace will lead to kernel memory corruption
and crashes.  This is something distributions can't support.

uio_pci_generic is already abused like that, mostly
because when I wrote it, I didn't add enough protections
against using it with DMA capable devices,
and we can't go back and break working userspace.
But at least it does not bind to VFs which all of
them are capable of DMA.

The result of merging this driver will be userspace abusing the
sysfs BAR access with VFs as well, and we do not want that.


Just forwarding events is not enough to make a valid driver.
What is missing is a way to access the device in a safe way.

On a more positive note:

What would be a reasonable interface? One that does the following
in kernel:

1. initializes device rings (can be in pinned userspace memory,
   but can not be writeable by userspace), brings up interface link
2. pins userspace memory (unless using e.g. hugetlbfs)
3. gets request, make sure it's valid and belongs to
   the correct task, put it in the ring
4. in the reverse direction, notify userspace when buffers
   are available in the ring
5. notify userspace about MSI (what this driver does)

What userspace can be allowed to do:

	format requests (e.g. transmit, receive) in userspace
	read ring contents

What userspace can't be allowed to do:

	access BAR
	write rings


This means that the driver can not be a generic one,
and there will be a system call overhead when you
write the ring, but that's the price you have to
pay for ability to run on systems without an IOMMU.




> ---
>  drivers/uio/Kconfig          |   9 ++
>  drivers/uio/Makefile         |   1 +
>  drivers/uio/uio_msi.c        | 378 +++++++++++++++++++++++++++++++++++++++++++
>  include/uapi/linux/Kbuild    |   1 +
>  include/uapi/linux/uio_msi.h |  22 +++
>  5 files changed, 411 insertions(+)
>  create mode 100644 drivers/uio/uio_msi.c
>  create mode 100644 include/uapi/linux/uio_msi.h
> 
> diff --git a/drivers/uio/Kconfig b/drivers/uio/Kconfig
> index 52c98ce..04adfa0 100644
> --- a/drivers/uio/Kconfig
> +++ b/drivers/uio/Kconfig
> @@ -93,6 +93,15 @@ config UIO_PCI_GENERIC
>  	  primarily, for virtualization scenarios.
>  	  If you compile this as a module, it will be called uio_pci_generic.
>  
> +config UIO_PCI_MSI
> +       tristate "Generic driver supporting MSI-x on PCI Express cards"
> +	depends on PCI
> +	help
> +	  Generic driver that provides Message Signalled IRQ events
> +	  similar to VFIO. If IOMMMU is available please use VFIO
> +	  instead since it provides more security.
> +	  If you compile this as a module, it will be called uio_msi.
> +
>  config UIO_NETX
>  	tristate "Hilscher NetX Card driver"
>  	depends on PCI
> diff --git a/drivers/uio/Makefile b/drivers/uio/Makefile
> index 8560dad..62fc44b 100644
> --- a/drivers/uio/Makefile
> +++ b/drivers/uio/Makefile
> @@ -9,3 +9,4 @@ obj-$(CONFIG_UIO_NETX)	+= uio_netx.o
>  obj-$(CONFIG_UIO_PRUSS)         += uio_pruss.o
>  obj-$(CONFIG_UIO_MF624)         += uio_mf624.o
>  obj-$(CONFIG_UIO_FSL_ELBC_GPCM)	+= uio_fsl_elbc_gpcm.o
> +obj-$(CONFIG_UIO_PCI_MSI)	+= uio_msi.o
> diff --git a/drivers/uio/uio_msi.c b/drivers/uio/uio_msi.c
> new file mode 100644
> index 0000000..802b5c4
> --- /dev/null
> +++ b/drivers/uio/uio_msi.c
> @@ -0,0 +1,378 @@
> +/*-
> + *
> + * Copyright (c) 2015 by Brocade Communications Systems, Inc.
> + * Author: Stephen Hemminger <stephen@...workplumber.org>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 only.
> + */
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include <linux/device.h>
> +#include <linux/interrupt.h>
> +#include <linux/eventfd.h>
> +#include <linux/module.h>
> +#include <linux/pci.h>
> +#include <linux/uio_driver.h>
> +#include <linux/msi.h>
> +#include <linux/uio_msi.h>
> +
> +#define DRIVER_VERSION		"0.1.1"
> +#define MAX_MSIX_VECTORS	64
> +
> +/* MSI-X vector information */
> +struct uio_msi_pci_dev {
> +	struct uio_info info;		/* UIO driver info */
> +	struct pci_dev *pdev;		/* PCI device */
> +	struct mutex	mutex;		/* open/release/ioctl mutex */
> +	int		ref_cnt;	/* references to device */
> +	unsigned int	max_vectors;	/* MSI-X slots available */
> +	struct msix_entry *msix;	/* MSI-X vector table */
> +	struct uio_msi_irq_ctx {
> +		struct eventfd_ctx *trigger; /* vector to eventfd */
> +		char *name;		/* name in /proc/interrupts */
> +	} *ctx;
> +};
> +
> +static irqreturn_t uio_intx_irqhandler(int irq, void *arg)
> +{
> +	struct uio_msi_pci_dev *udev = arg;
> +
> +	if (pci_check_and_mask_intx(udev->pdev)) {
> +		eventfd_signal(udev->ctx->trigger, 1);
> +		return IRQ_HANDLED;
> +	}
> +
> +	return IRQ_NONE;
> +}
> +
> +static irqreturn_t uio_msi_irqhandler(int irq, void *arg)
> +{
> +	struct eventfd_ctx *trigger = arg;
> +
> +	eventfd_signal(trigger, 1);
> +	return IRQ_HANDLED;
> +}
> +
> +/* set the mapping between vector # and existing eventfd. */
> +static int set_irq_eventfd(struct uio_msi_pci_dev *udev, u32 vec, int fd)
> +{
> +	struct eventfd_ctx *trigger;
> +	int irq, err;
> +
> +	if (vec >= udev->max_vectors) {
> +		dev_notice(&udev->pdev->dev, "vec %u >= num_vec %u\n",
> +			   vec, udev->max_vectors);
> +		return -ERANGE;
> +	}
> +
> +	irq = udev->msix[vec].vector;
> +	trigger = udev->ctx[vec].trigger;
> +	if (trigger) {
> +		/* Clearup existing irq mapping */
> +		free_irq(irq, trigger);
> +		eventfd_ctx_put(trigger);
> +		udev->ctx[vec].trigger = NULL;
> +	}
> +
> +	/* Passing -1 is used to disable interrupt */
> +	if (fd < 0)
> +		return 0;
> +
> +	trigger = eventfd_ctx_fdget(fd);
> +	if (IS_ERR(trigger)) {
> +		err = PTR_ERR(trigger);
> +		dev_notice(&udev->pdev->dev,
> +			   "eventfd ctx get failed: %d\n", err);
> +		return err;
> +	}
> +
> +	if (udev->msix)
> +		err = request_irq(irq, uio_msi_irqhandler, 0,
> +				  udev->ctx[vec].name, trigger);
> +	else
> +		err = request_irq(irq, uio_intx_irqhandler, IRQF_SHARED,
> +				  udev->ctx[vec].name, udev);
> +
> +	if (err) {
> +		dev_notice(&udev->pdev->dev,
> +			   "request irq failed: %d\n", err);
> +		eventfd_ctx_put(trigger);
> +		return err;
> +	}
> +
> +	udev->ctx[vec].trigger = trigger;
> +	return 0;
> +}
> +
> +static int
> +uio_msi_ioctl(struct uio_info *info, unsigned int cmd, unsigned long arg)
> +{
> +	struct uio_msi_pci_dev *udev
> +		= container_of(info, struct uio_msi_pci_dev, info);
> +	struct uio_msi_irq_set hdr;
> +	int err;
> +
> +	switch (cmd) {
> +	case UIO_MSI_IRQ_SET:
> +		if (copy_from_user(&hdr, (void __user *)arg, sizeof(hdr)))
> +			return -EFAULT;
> +
> +		mutex_lock(&udev->mutex);
> +		err = set_irq_eventfd(udev, hdr.vec, hdr.fd);
> +		mutex_unlock(&udev->mutex);
> +		break;
> +	default:
> +		err = -EOPNOTSUPP;
> +	}
> +	return err;
> +}
> +
> +/* Opening the UIO device for first time enables MSI-X */
> +static int
> +uio_msi_open(struct uio_info *info, struct inode *inode)
> +{
> +	struct uio_msi_pci_dev *udev
> +		= container_of(info, struct uio_msi_pci_dev, info);
> +	int err = 0;
> +
> +	mutex_lock(&udev->mutex);
> +	if (udev->ref_cnt++ == 0) {
> +		if (udev->msix)
> +			err = pci_enable_msix(udev->pdev, udev->msix,
> +					      udev->max_vectors);
> +	}
> +	mutex_unlock(&udev->mutex);
> +
> +	return err;
> +}
> +
> +/* Last close of the UIO device releases/disables all IRQ's */
> +static int
> +uio_msi_release(struct uio_info *info, struct inode *inode)
> +{
> +	struct uio_msi_pci_dev *udev
> +		= container_of(info, struct uio_msi_pci_dev, info);
> +	int i;
> +
> +	mutex_lock(&udev->mutex);
> +	if (--udev->ref_cnt == 0) {
> +		for (i = 0; i < udev->max_vectors; i++) {
> +			int irq = udev->msix[i].vector;
> +			struct eventfd_ctx *trigger = udev->ctx[i].trigger;
> +
> +			if (!trigger)
> +				continue;
> +
> +			free_irq(irq, trigger);
> +			eventfd_ctx_put(trigger);
> +			udev->ctx[i].trigger = NULL;
> +		}
> +
> +		if (udev->msix)
> +			pci_disable_msix(udev->pdev);
> +	}
> +	mutex_unlock(&udev->mutex);
> +
> +	return 0;
> +}
> +
> +/* Unmap previously ioremap'd resources */
> +static void
> +release_iomaps(struct uio_mem *mem)
> +{
> +	int i;
> +
> +	for (i = 0; i < MAX_UIO_MAPS; i++, mem++) {
> +		if (mem->internal_addr)
> +			iounmap(mem->internal_addr);
> +	}
> +}
> +
> +static int
> +setup_maps(struct pci_dev *pdev, struct uio_info *info)
> +{
> +	int i, m = 0, p = 0, err;
> +	static const char * const bar_names[] = {
> +		"BAR0",	"BAR1",	"BAR2",	"BAR3",	"BAR4",	"BAR5",
> +	};
> +
> +	for (i = 0; i < ARRAY_SIZE(bar_names); i++) {
> +		unsigned long start = pci_resource_start(pdev, i);
> +		unsigned long flags = pci_resource_flags(pdev, i);
> +		unsigned long len = pci_resource_len(pdev, i);
> +
> +		if (start == 0 || len == 0)
> +			continue;
> +
> +		if (flags & IORESOURCE_MEM) {
> +			void __iomem *addr;
> +
> +			if (m >= MAX_UIO_MAPS)
> +				continue;
> +
> +			addr = ioremap(start, len);
> +			if (addr == NULL) {
> +				err = -EINVAL;
> +				goto fail;
> +			}
> +
> +			info->mem[m].name = bar_names[i];
> +			info->mem[m].addr = start;
> +			info->mem[m].internal_addr = addr;
> +			info->mem[m].size = len;
> +			info->mem[m].memtype = UIO_MEM_PHYS;
> +			++m;
> +		} else if (flags & IORESOURCE_IO) {
> +			if (p >= MAX_UIO_PORT_REGIONS)
> +				continue;
> +
> +			info->port[p].name = bar_names[i];
> +			info->port[p].start = start;
> +			info->port[p].size = len;
> +			info->port[p].porttype = UIO_PORT_X86;
> +			++p;
> +		}
> +	}
> +
> +	return 0;
> + fail:
> +	for (i = 0; i < m; i++)
> +		iounmap(info->mem[i].internal_addr);
> +	return err;
> +}
> +
> +static int uio_msi_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> +{
> +	struct uio_msi_pci_dev *udev;
> +	int i, err, vectors;
> +
> +	udev = kzalloc(sizeof(struct uio_msi_pci_dev), GFP_KERNEL);
> +	if (!udev)
> +		return -ENOMEM;
> +
> +	err = pci_enable_device(pdev);
> +	if (err != 0) {
> +		dev_err(&pdev->dev, "cannot enable PCI device\n");
> +		goto fail_free;
> +	}
> +
> +	err = pci_request_regions(pdev, "uio_msi");
> +	if (err != 0) {
> +		dev_err(&pdev->dev, "Cannot request regions\n");
> +		goto fail_disable;
> +	}
> +
> +	pci_set_master(pdev);
> +
> +	/* remap resources */
> +	err = setup_maps(pdev, &udev->info);
> +	if (err)
> +		goto fail_release_iomem;
> +
> +	/* fill uio infos */
> +	udev->info.name = "uio_msi";
> +	udev->info.version = DRIVER_VERSION;
> +	udev->info.priv = udev;
> +	udev->pdev = pdev;
> +	udev->info.ioctl = uio_msi_ioctl;
> +	udev->info.open = uio_msi_open;
> +	udev->info.release = uio_msi_release;
> +	udev->info.irq = UIO_IRQ_CUSTOM;
> +	mutex_init(&udev->mutex);
> +
> +	vectors = pci_msix_vec_count(pdev);
> +	if (vectors > 0) {
> +		udev->max_vectors = min_t(u16, vectors, MAX_MSIX_VECTORS);
> +		dev_info(&pdev->dev, "using up to %u MSI-X vectors\n",
> +			udev->max_vectors);
> +
> +		err = -ENOMEM;
> +		udev->msix = kcalloc(udev->max_vectors,
> +				     sizeof(struct msix_entry), GFP_KERNEL);
> +		if (!udev->msix)
> +			goto fail_release_iomem;
> +	} else if (!pci_intx_mask_supported(pdev)) {
> +		dev_err(&pdev->dev,
> +			"device does not support MSI-X or INTX\n");
> +		err = -EINVAL;
> +		goto fail_release_iomem;
> +	} else {
> +		dev_notice(&pdev->dev, "using INTX\n");
> +		udev->info.irq_flags = IRQF_SHARED;
> +		udev->max_vectors = 1;
> +	}
> +
> +	udev->ctx = kcalloc(udev->max_vectors,
> +			    sizeof(struct uio_msi_irq_ctx), GFP_KERNEL);
> +	if (!udev->ctx)
> +		goto fail_free_msix;
> +
> +	for (i = 0; i < udev->max_vectors; i++) {
> +		udev->msix[i].entry = i;
> +
> +		udev->ctx[i].name = kasprintf(GFP_KERNEL,
> +					      KBUILD_MODNAME "[%d](%s)",
> +					      i, pci_name(pdev));
> +		if (!udev->ctx[i].name)
> +			goto fail_free_ctx;
> +	}
> +
> +	/* register uio driver */
> +	err = uio_register_device(&pdev->dev, &udev->info);
> +	if (err != 0)
> +		goto fail_free_ctx;
> +
> +	pci_set_drvdata(pdev, udev);
> +	return 0;
> +
> +fail_free_ctx:
> +	for (i = 0; i < udev->max_vectors; i++)
> +		kfree(udev->ctx[i].name);
> +	kfree(udev->ctx);
> +fail_free_msix:
> +	kfree(udev->msix);
> +fail_release_iomem:
> +	release_iomaps(udev->info.mem);
> +	pci_release_regions(pdev);
> +fail_disable:
> +	pci_disable_device(pdev);
> +fail_free:
> +	kfree(udev);
> +
> +	pr_notice("%s ret %d\n", __func__, err);
> +	return err;
> +}
> +
> +static void uio_msi_remove(struct pci_dev *pdev)
> +{
> +	struct uio_info *info = pci_get_drvdata(pdev);
> +	struct uio_msi_pci_dev *udev
> +		= container_of(info, struct uio_msi_pci_dev, info);
> +	int i;
> +
> +	uio_unregister_device(info);
> +	release_iomaps(info->mem);
> +
> +	pci_release_regions(pdev);
> +	for (i = 0; i < udev->max_vectors; i++)
> +		kfree(udev->ctx[i].name);
> +	kfree(udev->ctx);
> +	kfree(udev->msix);
> +	pci_disable_device(pdev);
> +
> +	pci_set_drvdata(pdev, NULL);
> +	kfree(udev);
> +}
> +
> +static struct pci_driver uio_msi_pci_driver = {
> +	.name = "uio_msi",
> +	.probe = uio_msi_probe,
> +	.remove = uio_msi_remove,
> +};
> +
> +module_pci_driver(uio_msi_pci_driver);
> +MODULE_VERSION(DRIVER_VERSION);
> +MODULE_LICENSE("GPL v2");
> +MODULE_AUTHOR("Stephen Hemminger <stephen@...workplumber.org>");
> +MODULE_DESCRIPTION("UIO driver for MSI PCI devices");
> diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
> index f7b2db4..d9497691 100644
> --- a/include/uapi/linux/Kbuild
> +++ b/include/uapi/linux/Kbuild
> @@ -411,6 +411,7 @@ header-y += udp.h
>  header-y += uhid.h
>  header-y += uinput.h
>  header-y += uio.h
> +header-y += uio_msi.h
>  header-y += ultrasound.h
>  header-y += un.h
>  header-y += unistd.h
> diff --git a/include/uapi/linux/uio_msi.h b/include/uapi/linux/uio_msi.h
> new file mode 100644
> index 0000000..297de00
> --- /dev/null
> +++ b/include/uapi/linux/uio_msi.h
> @@ -0,0 +1,22 @@
> +/*
> + * UIO_MSI API definition
> + *
> + * Copyright (c) 2015 by Brocade Communications Systems, Inc.
> + * All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +#ifndef _UIO_PCI_MSI_H
> +#define _UIO_PCI_MSI_H
> +
> +struct uio_msi_irq_set {
> +	u32 vec;
> +	int fd;
> +};
> +
> +#define UIO_MSI_BASE	0x86
> +#define UIO_MSI_IRQ_SET	_IOW('I', UIO_MSI_BASE+1, struct uio_msi_irq_set)
> +
> +#endif
> -- 
> 2.1.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ