lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Fri, 1 Feb 2013 10:43:58 +0000
From:	"Pandarathil, Vijaymohan R" <vijaymohan.pandarathil@...com>
To:	Alex Williamson <alex.williamson@...hat.com>
CC:	Gleb Natapov <gleb@...hat.com>,
	Bjorn Helgaas <bhelgaas@...gle.com>,
	Blue Swirl <blauwirbel@...il.com>,
	"Ortiz, Lance E" <lance.oritz@...com>,
	"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
	"qemu-devel@...gnu.org" <qemu-devel@...gnu.org>,
	"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH v2 2/3] VFIO-AER: Vfio-pci driver changes for supporting
 AER



> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@...hat.com]
> Sent: Tuesday, January 29, 2013 5:25 AM
> To: Pandarathil, Vijaymohan R
> Cc: Gleb Natapov; Bjorn Helgaas; Blue Swirl; Ortiz, Lance E;
> kvm@...r.kernel.org; qemu-devel@...gnu.org; linux-pci@...r.kernel.org;
> linux-kernel@...r.kernel.org
> Subject: Re: [PATCH v2 2/3] VFIO-AER: Vfio-pci driver changes for
> supporting AER
> 
> On Mon, 2013-01-28 at 12:31 -0700, Alex Williamson wrote:
> > On Mon, 2013-01-28 at 09:54 +0000, Pandarathil, Vijaymohan R wrote:
> > >  	- New VFIO_SET_IRQ ioctl option to pass the eventfd that is signalled
> when
> > >           an error occurs in the vfio_pci_device
> > >
> > > 	- Register pci_error_handler for the vfio_pci driver
> > >
> > > 	- When the device encounters an error, the error handler registered
> by
> > >           the vfio_pci driver gets invoked by the AER infrastructure
> > >
> > > 	- In the error handler, signal the eventfd registered for the device.
> > >
> > > 	- This results in the qemu eventfd handler getting invoked and
> > >           appropriate action taken for the guest.
> > >
> > > Signed-off-by: Vijay Mohan Pandarathil <vijaymohan.pandarathil@...com>
> > > ---
> > >  drivers/vfio/pci/vfio_pci.c         | 44
> ++++++++++++++++++++++++++++++++++++-
> > >  drivers/vfio/pci/vfio_pci_intrs.c   | 32 +++++++++++++++++++++++++++
> > >  drivers/vfio/pci/vfio_pci_private.h |  1 +
> > >  include/uapi/linux/vfio.h           |  3 +++
> > >  4 files changed, 79 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > > index b28e66c..ff2a078 100644
> > > --- a/drivers/vfio/pci/vfio_pci.c
> > > +++ b/drivers/vfio/pci/vfio_pci.c
> > > @@ -196,7 +196,9 @@ static int vfio_pci_get_irq_count(struct
> vfio_pci_device *vdev, int irq_type)
> > >
> > >  			return (flags & PCI_MSIX_FLAGS_QSIZE) + 1;
> > >  		}
> > > -	}
> > > +	} else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX)
> > > +		if (pci_is_pcie(vdev->pdev))
> > > +			return 1;
> > >
> > >  	return 0;
> > >  }
> > > @@ -223,9 +225,18 @@ static long vfio_pci_ioctl(void *device_data,
> > >  		if (vdev->reset_works)
> > >  			info.flags |= VFIO_DEVICE_FLAGS_RESET;
> > >
> > > +		if (pci_is_pcie(vdev->pdev)) {
> > > +			info.flags |= VFIO_DEVICE_FLAGS_PCI_AER;
> > > +			info.flags |= VFIO_DEVICE_FLAGS_PCI_AER_NOTIFY;
> >
> > Not sure this second flag should be AER specific or if it's even needed,
> > see below for more comments on this.
> >
> > > +		}
> > > +
> > >  		info.num_regions = VFIO_PCI_NUM_REGIONS;
> > >  		info.num_irqs = VFIO_PCI_NUM_IRQS;
> > >
> > > +		/* Expose only implemented IRQs */
> > > +		if (!(info.flags & VFIO_DEVICE_FLAGS_PCI_AER_NOTIFY))
> > > +			info.num_irqs--;
> >
> > I'm having second thoughts on this, see further below.
> >
> > > +
> > >  		return copy_to_user((void __user *)arg, &info, minsz);
> > >
> > >  	} else if (cmd == VFIO_DEVICE_GET_REGION_INFO) {
> > > @@ -302,6 +313,10 @@ static long vfio_pci_ioctl(void *device_data,
> > >  		if (info.argsz < minsz || info.index >= VFIO_PCI_NUM_IRQS)
> > >  			return -EINVAL;
> > >
> > > +		if ((info.index == VFIO_PCI_ERR_IRQ_INDEX) &&
> > > +		     !pci_is_pcie(vdev->pdev))
> > > +			return -EINVAL;
> > > +
> >
> > Perhaps we could incorporate the index test above this too?
> >
> > switch (info.index) {
> > case VFIO_PCI_INTX_IRQ_INDEX: ... VFIO_PCI_MSIX_IRQ_INDEX:
> > 	break;
> > case VFIO_PCI_ERR_IRQ_INDEX:
> > 	if (pci_is_pcie(vdev->pdev))
> > 		break;
> > default:
> > 	return -EINVAL;
> > }
> >
> > This is more similar to how I've re-written the same for the proposed
> > VGA/legacy I/O support.
> >
> > >  		info.flags = VFIO_IRQ_INFO_EVENTFD;
> > >
> > >  		info.count = vfio_pci_get_irq_count(vdev, info.index);
> > > @@ -538,11 +553,38 @@ static void vfio_pci_remove(struct pci_dev *pdev)
> > >  	kfree(vdev);
> > >  }
> > >
> > > +static pci_ers_result_t vfio_err_detected(struct pci_dev *pdev,
> > > +				pci_channel_state_t state)
> >
> > This is actually AER specific, right?  So perhaps it should be
> > vfio_pci_aer_err_detected?
> >
> > Also, please follow existing whitespace usage throughout, tabs followed
> > by spaces to align function parameter wrap.
> >
> > > +{
> > > +	struct vfio_pci_device *vpdev;
> > > +	void *vdev;
> >
> > struct vfio_device *vdev;
> >
> > > +
> > > +	vdev = vfio_device_get_from_dev(&pdev->dev);
> > > +	if (vdev == NULL)
> > > +		return PCI_ERS_RESULT_DISCONNECT;
> > > +
> > > +	vpdev = vfio_device_data(vdev);
> > > +	if (vpdev == NULL)
> > > +		return PCI_ERS_RESULT_DISCONNECT;
> > > +
> > > +	if (vpdev->err_trigger)
> > > +		eventfd_signal(vpdev->err_trigger, 1);
> > > +
> > > +	vfio_device_put_vdev(vdev);
> > > +
> > > +	return PCI_ERS_RESULT_CAN_RECOVER;
> > > +}
> > > +
> > > +static const struct pci_error_handlers vfio_err_handlers = {
> > > +	.error_detected = vfio_err_detected,
> > > +};
> > > +
> > >  static struct pci_driver vfio_pci_driver = {
> > >  	.name		= "vfio-pci",
> > >  	.id_table	= NULL, /* only dynamic ids */
> > >  	.probe		= vfio_pci_probe,
> > >  	.remove		= vfio_pci_remove,
> > > +	.err_handler	= &vfio_err_handlers,
> > >  };
> > >
> > >  static void __exit vfio_pci_cleanup(void)
> > > diff --git a/drivers/vfio/pci/vfio_pci_intrs.c
> b/drivers/vfio/pci/vfio_pci_intrs.c
> > > index 3639371..f003e08 100644
> > > --- a/drivers/vfio/pci/vfio_pci_intrs.c
> > > +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> > > @@ -745,6 +745,31 @@ static int vfio_pci_set_msi_trigger(struct
> vfio_pci_device *vdev,
> > >  	return 0;
> > >  }
> > >
> > > +static int vfio_pci_set_err_eventfd(struct vfio_pci_device *vdev,
> > > +				    unsigned index, unsigned start,
> > > +				    unsigned count, uint32_t flags, void *data)
> >
> >
> > Rename to vfio_pci_set_err_trigger?  The other functions mostly only
> > support eventfd too.
> >
> > > +{
> > > +	if ((index == VFIO_PCI_ERR_IRQ_INDEX) &&
> > > +	    (flags & VFIO_IRQ_SET_DATA_EVENTFD)   &&
> > > +	    pci_is_pcie(vdev->pdev)) {
> >
> > It would clean up the indentation to have this be:
> >
> >         if (!supported stuff)
> >         	return -EINVAL;
> >
> >         do stuff
> >
> > Testing the index seems overly paranoid here given the caller.  The
> > caller is also already testing pci_is_pcie.
> >
> > Why not support DATA_NONE and DATA_BOOL?  It would be useful loopback
> > testing for userspace to be able to trigger an AER notification.
> >
> > > +
> > > +		int32_t fd = *(int32_t *)data;
> > > +
> > > +		if (fd == -1) {
> > > +			if (vdev->err_trigger)
> > > +				eventfd_ctx_put(vdev->err_trigger);
> > > +			vdev->err_trigger = NULL;
> > > +			return 0;
> > > +		} else if (fd >= 0) {
> > > +			vdev->err_trigger = eventfd_ctx_fdget(fd);
> > > +			if (IS_ERR(vdev->err_trigger))
> > > +				return PTR_ERR(vdev->err_trigger);
> > > +			return 0;
> > > +		} else
> > > +			return -EINVAL;
> > > +	}
> > > +	return -EINVAL;
> > > +}
> > >  int vfio_pci_set_irqs_ioctl(struct vfio_pci_device *vdev, uint32_t
> flags,
> > >  			    unsigned index, unsigned start, unsigned count,
> > >  			    void *data)
> > > @@ -779,6 +804,13 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_device
> *vdev, uint32_t flags,
> > >  			break;
> > >  		}
> > >  		break;
> > > +	case VFIO_PCI_ERR_IRQ_INDEX:
> > > +		switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
> > > +		case VFIO_IRQ_SET_ACTION_TRIGGER:
> > > +			if (pci_is_pcie(vdev->pdev))
> > > +				func = vfio_pci_set_err_eventfd;
> > > +			break;
> > > +		}
> > >  	}
> > >
> > >  	if (!func)
> > > diff --git a/drivers/vfio/pci/vfio_pci_private.h
> b/drivers/vfio/pci/vfio_pci_private.h
> > > index 611827c..daee62f 100644
> > > --- a/drivers/vfio/pci/vfio_pci_private.h
> > > +++ b/drivers/vfio/pci/vfio_pci_private.h
> > > @@ -55,6 +55,7 @@ struct vfio_pci_device {
> > >  	bool			bardirty;
> > >  	struct pci_saved_state	*pci_saved_state;
> > >  	atomic_t		refcnt;
> > > +	struct eventfd_ctx	*err_trigger;
> > >  };
> > >
> > >  #define is_intx(vdev) (vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX)
> > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > > index 4758d1b..e81eb4d 100644
> > > --- a/include/uapi/linux/vfio.h
> > > +++ b/include/uapi/linux/vfio.h
> > > @@ -147,6 +147,8 @@ struct vfio_device_info {
> > >  	__u32	flags;
> > >  #define VFIO_DEVICE_FLAGS_RESET	(1 << 0)	/* Device supports
> reset */
> > >  #define VFIO_DEVICE_FLAGS_PCI	(1 << 1)	/* vfio-pci device */
> > > +#define VFIO_DEVICE_FLAGS_PCI_AER	(1 << 2) /* AER capable */
> > > +#define VFIO_DEVICE_FLAGS_PCI_AER_NOTIFY (1 << 3) /* Supports aer
> notify */
> > >
> > >  	__u32	num_regions;	/* Max region index + 1 */
> > >  	__u32	num_irqs;	/* Max IRQ index + 1 */
> >                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >
> > This part of the vfio spec has been causing me trouble.  We discussed it
> > a bit offline, but I'd appreciate any feedback from you or the broader
> > community.  num_foo indicates a count, however the comment clearly
> > states this is a max index.  With the VGA patches I've been playing with
> > using them as a count where userspace would probe indexes until it finds
> > num_foo implemented (ie. INFO_IRQ/REGION returns >=0).  I guess the
> > original intention was was to probe up to num_foo-1 and unavailable
> > indexes return <0.  Looking at it again, this seems like the more
> > deterministic approach.  For instance, if we add LEGACY_IOPORT and
> > LEGACY_MMIO regions at index 8 & 9, then ERR region at index 10, it's
> > easier for userspace to know to stop searching at index 10 instead of
> > probing indexes it may not understand trying to find the full count.
> > Does that sound right?
> >
> > So Vijay, please don't shot me for changing my mind, but I'm inclined to
> > think you were probably right that DEVICE_INFO should just return
> > num_irqs = VFIO_PCI_NUM_IRQS regardless of whether ERR_IRQ_INDEX is
> > supported.  However IRQ_INFO should still return <0 if the device is not
> > pcie.  It seems like this also means that we don't need flags indicating
> > which indexes are present as that duplicates simply looking for them.
> >
> > Which flags do we actually need then?  Should the AER flag be a device
> > flag or is supporting AER error reporting a feature of
> > VFIO_PCI_ERR_IRQ_INDEX and should therefore be reported as a flag there?
> > So far REGION_INFO and IRQ_INFO flags only report capabilities of the
> > item and not it's purpose, but so far all the regions and irqs have very
> > fixed purposes.
> >
> > I'm inclined to think that a LEGACY_IOPORT region reporting that it
> > supports VGA IOPORT space 3b0 and 3c0 makes more sense than a general
> > device VGA flag and inferring 3b0 & 3c0 based on the existence of a
> > LEGACY_IOPORT region.
> >
> > If we put flags on INFO_REGION and INFO_IRQ, where do they go?  We've
> > got a u32 on each for flags.  We could split that and define bits >=16
> > are for type specific flags and <16 are generic.  We could also define a
> > generic flag indicating a type specific flag field is present.
> > region_info already has a reserved u32 for alignment that could fill
> > this roll, irq_info would need to add a field.  Perhaps something like:
> >
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -176,8 +176,9 @@ struct vfio_region_info {
> >  #define VFIO_REGION_INFO_FLAG_READ     (1 << 0) /* Region supports read
> */
> >  #define VFIO_REGION_INFO_FLAG_WRITE    (1 << 1) /* Region supports write
> */
> >  #define VFIO_REGION_INFO_FLAG_MMAP     (1 << 2) /* Region supports mmap
> */
> > +#define VFIO_REGION_INFO_TYPE_FLAGS    (1 << 3)
> >         __u32   index;          /* Region index */
> > -       __u32   resv;           /* Reserved for alignment */
> > +       __u32   type_flags;     /* Type specific feature flags */
> >         __u64   size;           /* Region size (bytes) */
> >         __u64   offset;         /* Region offset from start of device fd
> */
> >  };
> > @@ -222,8 +223,10 @@ struct vfio_irq_info {
> >  #define VFIO_IRQ_INFO_MASKABLE         (1 << 1)
> >  #define VFIO_IRQ_INFO_AUTOMASKED       (1 << 2)
> >  #define VFIO_IRQ_INFO_NORESIZE         (1 << 3)
> > +#define VFIO_IRQ_INFO_TYPE_FLAGS       (1 << 4)
> >         __u32   index;          /* IRQ index */
> >         __u32   count;          /* Number of IRQs within this index */
> > +       __u32   type_flags;     /* Type specific feature flags */
> >  };
> >  #define VFIO_DEVICE_GET_IRQ_INFO       _IO(VFIO_TYPE, VFIO_BASE + 9)
> >
> > We'd then have something like
> >
> > #define VFIO_PCI_ERR_IRQ_TYPE_AER	(1 << 0)
> >
> > and
> >
> > #define VFIO_PCI_LEGACY_IOPORT_REGION_TYPE_VGA_3b0	(1 << 0)
> > #define VFIO_PCI_LEGACY_IOPORT_REGION_TYPE_VGA_3c0	(1 << 1)
> > #define VFIO_PCI_LEGACY_MMIO_REGION_TYPE_VGA_a0000	(1 << 0)
> >
> > #define VFIO_PCI_ERR_REGION_TYPE_AER	(1 << 0)
> 
> Hmm, maybe IRQs don't need a type_flag.  For VGA we're exposing a region
> and describing implemented sections within the region.  For IRQs, we
> have practically a limitless space.  Perhaps that means this IRQ should
> just be PCI_AER_EVENT_IRQ and if ever we need a different error
> reporting IRQ we'll add a new index.  This way we don't have to add a
> new field to irq_info and try to unnecessarily generalize a single
> index.

Makes things simpler. I will be taking out the VFIO_DEVICE_FLAGS_PCI_AER* flags altogether and just have the VFIO_PCI_ERR_IRQ_INDEX. When we want to add error reporting we can add a VFIO_PCI_ERR_REGION_INDEX.

Vijay

> 
> I'll need to think about VGA though, we have the same (practically)
> limitless index range there as well, but in the kernel side
> implementation we used fixed sized segments into the device fd to know
> which region is being accessed.  This isn't visible to userspace, so
> we're free to change it.  If we did that, we might expose the specific
> regions as separate indexes rather than try to generalize them into
> reusable "legacy" ranges.  Again, comments welcome.  Thanks,
> 
> Alex
> 
> >
> > What do you think?  It would be useful to prototype these with both AER
> > and VGA before committing.  Thanks,
> >
> > Alex
> >
> > >  };
> > > @@ -310,6 +312,7 @@ enum {
> > >  	VFIO_PCI_INTX_IRQ_INDEX,
> > >  	VFIO_PCI_MSI_IRQ_INDEX,
> > >  	VFIO_PCI_MSIX_IRQ_INDEX,
> > > +	VFIO_PCI_ERR_IRQ_INDEX,
> > >  	VFIO_PCI_NUM_IRQS
> > >  };
> > >
> >
> >
> 
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ