[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <F9E001219150CB45BEDC82A650F360C9014AAB56@G9W0717.americas.hpqcorp.net>
Date: Fri, 1 Feb 2013 10:43:58 +0000
From: "Pandarathil, Vijaymohan R" <vijaymohan.pandarathil@...com>
To: Alex Williamson <alex.williamson@...hat.com>
CC: Gleb Natapov <gleb@...hat.com>,
Bjorn Helgaas <bhelgaas@...gle.com>,
Blue Swirl <blauwirbel@...il.com>,
"Ortiz, Lance E" <lance.oritz@...com>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"qemu-devel@...gnu.org" <qemu-devel@...gnu.org>,
"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH v2 2/3] VFIO-AER: Vfio-pci driver changes for supporting
AER
> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@...hat.com]
> Sent: Tuesday, January 29, 2013 5:25 AM
> To: Pandarathil, Vijaymohan R
> Cc: Gleb Natapov; Bjorn Helgaas; Blue Swirl; Ortiz, Lance E;
> kvm@...r.kernel.org; qemu-devel@...gnu.org; linux-pci@...r.kernel.org;
> linux-kernel@...r.kernel.org
> Subject: Re: [PATCH v2 2/3] VFIO-AER: Vfio-pci driver changes for
> supporting AER
>
> On Mon, 2013-01-28 at 12:31 -0700, Alex Williamson wrote:
> > On Mon, 2013-01-28 at 09:54 +0000, Pandarathil, Vijaymohan R wrote:
> > > - New VFIO_SET_IRQ ioctl option to pass the eventfd that is signalled
> when
> > > an error occurs in the vfio_pci_device
> > >
> > > - Register pci_error_handler for the vfio_pci driver
> > >
> > > - When the device encounters an error, the error handler registered
> by
> > > the vfio_pci driver gets invoked by the AER infrastructure
> > >
> > > - In the error handler, signal the eventfd registered for the device.
> > >
> > > - This results in the qemu eventfd handler getting invoked and
> > > appropriate action taken for the guest.
> > >
> > > Signed-off-by: Vijay Mohan Pandarathil <vijaymohan.pandarathil@...com>
> > > ---
> > > drivers/vfio/pci/vfio_pci.c | 44
> ++++++++++++++++++++++++++++++++++++-
> > > drivers/vfio/pci/vfio_pci_intrs.c | 32 +++++++++++++++++++++++++++
> > > drivers/vfio/pci/vfio_pci_private.h | 1 +
> > > include/uapi/linux/vfio.h | 3 +++
> > > 4 files changed, 79 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > > index b28e66c..ff2a078 100644
> > > --- a/drivers/vfio/pci/vfio_pci.c
> > > +++ b/drivers/vfio/pci/vfio_pci.c
> > > @@ -196,7 +196,9 @@ static int vfio_pci_get_irq_count(struct
> vfio_pci_device *vdev, int irq_type)
> > >
> > > return (flags & PCI_MSIX_FLAGS_QSIZE) + 1;
> > > }
> > > - }
> > > + } else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX)
> > > + if (pci_is_pcie(vdev->pdev))
> > > + return 1;
> > >
> > > return 0;
> > > }
> > > @@ -223,9 +225,18 @@ static long vfio_pci_ioctl(void *device_data,
> > > if (vdev->reset_works)
> > > info.flags |= VFIO_DEVICE_FLAGS_RESET;
> > >
> > > + if (pci_is_pcie(vdev->pdev)) {
> > > + info.flags |= VFIO_DEVICE_FLAGS_PCI_AER;
> > > + info.flags |= VFIO_DEVICE_FLAGS_PCI_AER_NOTIFY;
> >
> > Not sure this second flag should be AER specific or if it's even needed,
> > see below for more comments on this.
> >
> > > + }
> > > +
> > > info.num_regions = VFIO_PCI_NUM_REGIONS;
> > > info.num_irqs = VFIO_PCI_NUM_IRQS;
> > >
> > > + /* Expose only implemented IRQs */
> > > + if (!(info.flags & VFIO_DEVICE_FLAGS_PCI_AER_NOTIFY))
> > > + info.num_irqs--;
> >
> > I'm having second thoughts on this, see further below.
> >
> > > +
> > > return copy_to_user((void __user *)arg, &info, minsz);
> > >
> > > } else if (cmd == VFIO_DEVICE_GET_REGION_INFO) {
> > > @@ -302,6 +313,10 @@ static long vfio_pci_ioctl(void *device_data,
> > > if (info.argsz < minsz || info.index >= VFIO_PCI_NUM_IRQS)
> > > return -EINVAL;
> > >
> > > + if ((info.index == VFIO_PCI_ERR_IRQ_INDEX) &&
> > > + !pci_is_pcie(vdev->pdev))
> > > + return -EINVAL;
> > > +
> >
> > Perhaps we could incorporate the index test above this too?
> >
> > switch (info.index) {
> > case VFIO_PCI_INTX_IRQ_INDEX: ... VFIO_PCI_MSIX_IRQ_INDEX:
> > break;
> > case VFIO_PCI_ERR_IRQ_INDEX:
> > if (pci_is_pcie(vdev->pdev))
> > break;
> > default:
> > return -EINVAL;
> > }
> >
> > This is more similar to how I've re-written the same for the proposed
> > VGA/legacy I/O support.
> >
> > > info.flags = VFIO_IRQ_INFO_EVENTFD;
> > >
> > > info.count = vfio_pci_get_irq_count(vdev, info.index);
> > > @@ -538,11 +553,38 @@ static void vfio_pci_remove(struct pci_dev *pdev)
> > > kfree(vdev);
> > > }
> > >
> > > +static pci_ers_result_t vfio_err_detected(struct pci_dev *pdev,
> > > + pci_channel_state_t state)
> >
> > This is actually AER specific, right? So perhaps it should be
> > vfio_pci_aer_err_detected?
> >
> > Also, please follow existing whitespace usage throughout, tabs followed
> > by spaces to align function parameter wrap.
> >
> > > +{
> > > + struct vfio_pci_device *vpdev;
> > > + void *vdev;
> >
> > struct vfio_device *vdev;
> >
> > > +
> > > + vdev = vfio_device_get_from_dev(&pdev->dev);
> > > + if (vdev == NULL)
> > > + return PCI_ERS_RESULT_DISCONNECT;
> > > +
> > > + vpdev = vfio_device_data(vdev);
> > > + if (vpdev == NULL)
> > > + return PCI_ERS_RESULT_DISCONNECT;
> > > +
> > > + if (vpdev->err_trigger)
> > > + eventfd_signal(vpdev->err_trigger, 1);
> > > +
> > > + vfio_device_put_vdev(vdev);
> > > +
> > > + return PCI_ERS_RESULT_CAN_RECOVER;
> > > +}
> > > +
> > > +static const struct pci_error_handlers vfio_err_handlers = {
> > > + .error_detected = vfio_err_detected,
> > > +};
> > > +
> > > static struct pci_driver vfio_pci_driver = {
> > > .name = "vfio-pci",
> > > .id_table = NULL, /* only dynamic ids */
> > > .probe = vfio_pci_probe,
> > > .remove = vfio_pci_remove,
> > > + .err_handler = &vfio_err_handlers,
> > > };
> > >
> > > static void __exit vfio_pci_cleanup(void)
> > > diff --git a/drivers/vfio/pci/vfio_pci_intrs.c
> b/drivers/vfio/pci/vfio_pci_intrs.c
> > > index 3639371..f003e08 100644
> > > --- a/drivers/vfio/pci/vfio_pci_intrs.c
> > > +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> > > @@ -745,6 +745,31 @@ static int vfio_pci_set_msi_trigger(struct
> vfio_pci_device *vdev,
> > > return 0;
> > > }
> > >
> > > +static int vfio_pci_set_err_eventfd(struct vfio_pci_device *vdev,
> > > + unsigned index, unsigned start,
> > > + unsigned count, uint32_t flags, void *data)
> >
> >
> > Rename to vfio_pci_set_err_trigger? The other functions mostly only
> > support eventfd too.
> >
> > > +{
> > > + if ((index == VFIO_PCI_ERR_IRQ_INDEX) &&
> > > + (flags & VFIO_IRQ_SET_DATA_EVENTFD) &&
> > > + pci_is_pcie(vdev->pdev)) {
> >
> > It would clean up the indentation to have this be:
> >
> > if (!supported stuff)
> > return -EINVAL;
> >
> > do stuff
> >
> > Testing the index seems overly paranoid here given the caller. The
> > caller is also already testing pci_is_pcie.
> >
> > Why not support DATA_NONE and DATA_BOOL? It would be useful loopback
> > testing for userspace to be able to trigger an AER notification.
> >
> > > +
> > > + int32_t fd = *(int32_t *)data;
> > > +
> > > + if (fd == -1) {
> > > + if (vdev->err_trigger)
> > > + eventfd_ctx_put(vdev->err_trigger);
> > > + vdev->err_trigger = NULL;
> > > + return 0;
> > > + } else if (fd >= 0) {
> > > + vdev->err_trigger = eventfd_ctx_fdget(fd);
> > > + if (IS_ERR(vdev->err_trigger))
> > > + return PTR_ERR(vdev->err_trigger);
> > > + return 0;
> > > + } else
> > > + return -EINVAL;
> > > + }
> > > + return -EINVAL;
> > > +}
> > > int vfio_pci_set_irqs_ioctl(struct vfio_pci_device *vdev, uint32_t
> flags,
> > > unsigned index, unsigned start, unsigned count,
> > > void *data)
> > > @@ -779,6 +804,13 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_device
> *vdev, uint32_t flags,
> > > break;
> > > }
> > > break;
> > > + case VFIO_PCI_ERR_IRQ_INDEX:
> > > + switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
> > > + case VFIO_IRQ_SET_ACTION_TRIGGER:
> > > + if (pci_is_pcie(vdev->pdev))
> > > + func = vfio_pci_set_err_eventfd;
> > > + break;
> > > + }
> > > }
> > >
> > > if (!func)
> > > diff --git a/drivers/vfio/pci/vfio_pci_private.h
> b/drivers/vfio/pci/vfio_pci_private.h
> > > index 611827c..daee62f 100644
> > > --- a/drivers/vfio/pci/vfio_pci_private.h
> > > +++ b/drivers/vfio/pci/vfio_pci_private.h
> > > @@ -55,6 +55,7 @@ struct vfio_pci_device {
> > > bool bardirty;
> > > struct pci_saved_state *pci_saved_state;
> > > atomic_t refcnt;
> > > + struct eventfd_ctx *err_trigger;
> > > };
> > >
> > > #define is_intx(vdev) (vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX)
> > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > > index 4758d1b..e81eb4d 100644
> > > --- a/include/uapi/linux/vfio.h
> > > +++ b/include/uapi/linux/vfio.h
> > > @@ -147,6 +147,8 @@ struct vfio_device_info {
> > > __u32 flags;
> > > #define VFIO_DEVICE_FLAGS_RESET (1 << 0) /* Device supports
> reset */
> > > #define VFIO_DEVICE_FLAGS_PCI (1 << 1) /* vfio-pci device */
> > > +#define VFIO_DEVICE_FLAGS_PCI_AER (1 << 2) /* AER capable */
> > > +#define VFIO_DEVICE_FLAGS_PCI_AER_NOTIFY (1 << 3) /* Supports aer
> notify */
> > >
> > > __u32 num_regions; /* Max region index + 1 */
> > > __u32 num_irqs; /* Max IRQ index + 1 */
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >
> > This part of the vfio spec has been causing me trouble. We discussed it
> > a bit offline, but I'd appreciate any feedback from you or the broader
> > community. num_foo indicates a count, however the comment clearly
> > states this is a max index. With the VGA patches I've been playing with
> > using them as a count where userspace would probe indexes until it finds
> > num_foo implemented (ie. INFO_IRQ/REGION returns >=0). I guess the
> > original intention was was to probe up to num_foo-1 and unavailable
> > indexes return <0. Looking at it again, this seems like the more
> > deterministic approach. For instance, if we add LEGACY_IOPORT and
> > LEGACY_MMIO regions at index 8 & 9, then ERR region at index 10, it's
> > easier for userspace to know to stop searching at index 10 instead of
> > probing indexes it may not understand trying to find the full count.
> > Does that sound right?
> >
> > So Vijay, please don't shot me for changing my mind, but I'm inclined to
> > think you were probably right that DEVICE_INFO should just return
> > num_irqs = VFIO_PCI_NUM_IRQS regardless of whether ERR_IRQ_INDEX is
> > supported. However IRQ_INFO should still return <0 if the device is not
> > pcie. It seems like this also means that we don't need flags indicating
> > which indexes are present as that duplicates simply looking for them.
> >
> > Which flags do we actually need then? Should the AER flag be a device
> > flag or is supporting AER error reporting a feature of
> > VFIO_PCI_ERR_IRQ_INDEX and should therefore be reported as a flag there?
> > So far REGION_INFO and IRQ_INFO flags only report capabilities of the
> > item and not it's purpose, but so far all the regions and irqs have very
> > fixed purposes.
> >
> > I'm inclined to think that a LEGACY_IOPORT region reporting that it
> > supports VGA IOPORT space 3b0 and 3c0 makes more sense than a general
> > device VGA flag and inferring 3b0 & 3c0 based on the existence of a
> > LEGACY_IOPORT region.
> >
> > If we put flags on INFO_REGION and INFO_IRQ, where do they go? We've
> > got a u32 on each for flags. We could split that and define bits >=16
> > are for type specific flags and <16 are generic. We could also define a
> > generic flag indicating a type specific flag field is present.
> > region_info already has a reserved u32 for alignment that could fill
> > this roll, irq_info would need to add a field. Perhaps something like:
> >
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -176,8 +176,9 @@ struct vfio_region_info {
> > #define VFIO_REGION_INFO_FLAG_READ (1 << 0) /* Region supports read
> */
> > #define VFIO_REGION_INFO_FLAG_WRITE (1 << 1) /* Region supports write
> */
> > #define VFIO_REGION_INFO_FLAG_MMAP (1 << 2) /* Region supports mmap
> */
> > +#define VFIO_REGION_INFO_TYPE_FLAGS (1 << 3)
> > __u32 index; /* Region index */
> > - __u32 resv; /* Reserved for alignment */
> > + __u32 type_flags; /* Type specific feature flags */
> > __u64 size; /* Region size (bytes) */
> > __u64 offset; /* Region offset from start of device fd
> */
> > };
> > @@ -222,8 +223,10 @@ struct vfio_irq_info {
> > #define VFIO_IRQ_INFO_MASKABLE (1 << 1)
> > #define VFIO_IRQ_INFO_AUTOMASKED (1 << 2)
> > #define VFIO_IRQ_INFO_NORESIZE (1 << 3)
> > +#define VFIO_IRQ_INFO_TYPE_FLAGS (1 << 4)
> > __u32 index; /* IRQ index */
> > __u32 count; /* Number of IRQs within this index */
> > + __u32 type_flags; /* Type specific feature flags */
> > };
> > #define VFIO_DEVICE_GET_IRQ_INFO _IO(VFIO_TYPE, VFIO_BASE + 9)
> >
> > We'd then have something like
> >
> > #define VFIO_PCI_ERR_IRQ_TYPE_AER (1 << 0)
> >
> > and
> >
> > #define VFIO_PCI_LEGACY_IOPORT_REGION_TYPE_VGA_3b0 (1 << 0)
> > #define VFIO_PCI_LEGACY_IOPORT_REGION_TYPE_VGA_3c0 (1 << 1)
> > #define VFIO_PCI_LEGACY_MMIO_REGION_TYPE_VGA_a0000 (1 << 0)
> >
> > #define VFIO_PCI_ERR_REGION_TYPE_AER (1 << 0)
>
> Hmm, maybe IRQs don't need a type_flag. For VGA we're exposing a region
> and describing implemented sections within the region. For IRQs, we
> have practically a limitless space. Perhaps that means this IRQ should
> just be PCI_AER_EVENT_IRQ and if ever we need a different error
> reporting IRQ we'll add a new index. This way we don't have to add a
> new field to irq_info and try to unnecessarily generalize a single
> index.
Makes things simpler. I will be taking out the VFIO_DEVICE_FLAGS_PCI_AER* flags altogether and just have the VFIO_PCI_ERR_IRQ_INDEX. When we want to add error reporting we can add a VFIO_PCI_ERR_REGION_INDEX.
Vijay
>
> I'll need to think about VGA though, we have the same (practically)
> limitless index range there as well, but in the kernel side
> implementation we used fixed sized segments into the device fd to know
> which region is being accessed. This isn't visible to userspace, so
> we're free to change it. If we did that, we might expose the specific
> regions as separate indexes rather than try to generalize them into
> reusable "legacy" ranges. Again, comments welcome. Thanks,
>
> Alex
>
> >
> > What do you think? It would be useful to prototype these with both AER
> > and VGA before committing. Thanks,
> >
> > Alex
> >
> > > };
> > > @@ -310,6 +312,7 @@ enum {
> > > VFIO_PCI_INTX_IRQ_INDEX,
> > > VFIO_PCI_MSI_IRQ_INDEX,
> > > VFIO_PCI_MSIX_IRQ_INDEX,
> > > + VFIO_PCI_ERR_IRQ_INDEX,
> > > VFIO_PCI_NUM_IRQS
> > > };
> > >
> >
> >
>
>
Powered by blists - more mailing lists