lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 13 Aug 2021 16:09:07 -0600
From:   Alex Williamson <alex.williamson@...hat.com>
To:     Russ Weight <russell.h.weight@...el.com>
Cc:     Cornelia Huck <cohuck@...hat.com>,
        "Adler, Michael" <michael.adler@...el.com>,
        "Whisonant, Tim" <tim.whisonant@...el.com>, <kvm@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Tom Rix <trix@...hat.com>
Subject: Re: BUG REPORT: vfio_pci driver

On Fri, 13 Aug 2021 11:34:51 -0700
Russ Weight <russell.h.weight@...el.com> wrote:

> Bug Description:
> 
> A bug in the vfio_pci driver was reported in junction with work on FPGA

This looks like the documented behavior of an IRQ index reporting the
VFIO_IRQ_INFO_NORESIZE flag.  We can certainly work towards trying to
remove the flag from this index, but it seems the userspace driver is
currently ignoring the flag and expecting exactly the behavior the flag
indicates is not available.  Thanks,

Alex

> cards. We were able to reproduce and root-cause the bug using system-tap.
> The original bug description is below. An understanding of the referenced
> dfl and opae tools is not required - it is the sequence of IOCTL calls and
> IRQ vectors that matters:
> 
> > I’m trying to get an example AFU working that uses 2 IRQs, active at the same 
> > time. I’m hitting what looks to be a dfl_pci driver bug.
> >
> > The code tries to allocate two IRQ vectors: 0 and 1. I see opaevfio.c doing the 
> > right thing, picking the MSIX index. Allocating either IRQ 0 or IRQ 1 works fine 
> > and I confirm that the VFIO_DEVICE_SET_IRQS looks reasonable, choosing MSIX and 
> > either start of 0 or 1 and count 1.
> >
> > Note that opaevfio.c always passes count 1, so it will make separate calls for 
> > each IRQ vector.
> >
> > When I try to allocate both, I see the following:
> >
> >   * If the VFIO_DEVICE_SET_IRQS ioctl is called first with start 0 and then
> >     start 1 (always count 1), the start 1 (second) ioctl trap returns EINVAL.
> >   * If I set up the vectors in decreasing order, so start 1 followed by start 0,
> >     the program works!
> >   * I ruled out OPAE SDK user space problems by setting up my program to
> >     allocate in increasing order, which would normally fail. I changed only the
> >     ioctl call in user space opaevfio.c, inverting bit 0 of start so that the
> >     driver is called in decreasing index order. Of course this binds the wrong
> >     vectors to the fds, but I don’t care about that for now. This works! From
> >     this, I conclude that it can’t be a user space problem since the difference
> >     between working and failing is solely the order in which IRQ vectors are
> >     bound in ioctl calls.  
> 
> The EINVAL is coming from vfio_msi_set_block() here:
> https://github.com/torvalds/linux/blob/master/drivers/vfio/pci/vfio_pci_intrs.c#L373
> 
> vfio_msi_set_block() is being called from vfio_pci_set_msi_trigger() here on
> the second IRQ request:
> https://github.com/torvalds/linux/blob/master/drivers/vfio/pci/vfio_pci_intrs.c#L530
> 
> We believe the bug is in vfio_pci_set_msi_trigger(), in the 2nd parameter to the call
> to vfio_msi_enable() here:
> https://github.com/torvalds/linux/blob/master/drivers/vfio/pci/vfio_pci_intrs.c#L533
> 
> In both the passing and failing cases, the first IRQ request results in a call
> to vfio_msi_enable() at line 533 and the second IRQ request results in the
> call to vfio_msi_set_block() at line 530. It is during the first IRQ request
> that vfio_msi_enable() sets vdev->num_ctx based on the 2nd parameter (nvec).
> vdev->num_ctx is part of the conditional that results in the EINVAL for the
> failing case.
> 
> In the passing case, vdev->num_ctx is 2. In the failing case, it is 1.
> 
> I am attaching two text files containing trace information from systemtap: one for
> the failing case and one for the passing case. They contain a lot more information
> than is needed, but if you search for vfio_pci_set_msi_trigger and vfio_msi_set_block,
> you will see values for some of the call parameters.
> 
> - Russ
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ