linux-kernel - RE: [PATCH v1 9/9] smaples: add vfio-mdev-pci driver

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <A2975661238FB949B60364EF0F2C257439F1FF4E@SHSMSX104.ccr.corp.intel.com>
Date:   Thu, 4 Jul 2019 09:11:02 +0000
From:   "Liu, Yi L" <yi.l.liu@...el.com>
To:     Alex Williamson <alex.williamson@...hat.com>
CC:     "kwankhede@...dia.com" <kwankhede@...dia.com>,
        "Tian, Kevin" <kevin.tian@...el.com>,
        "baolu.lu@...ux.intel.com" <baolu.lu@...ux.intel.com>,
        "Sun, Yi Y" <yi.y.sun@...el.com>,
        "joro@...tes.org" <joro@...tes.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        Masahiro Yamada <yamada.masahiro@...ionext.com>
Subject: RE: [PATCH v1 9/9] smaples: add vfio-mdev-pci driver

Hi Alex,

> From: Alex Williamson [mailto:alex.williamson@...hat.com]
> Sent: Thursday, July 4, 2019 1:22 AM
> To: Liu, Yi L <yi.l.liu@...el.com>
> Subject: Re: [PATCH v1 9/9] smaples: add vfio-mdev-pci driver
> 
> On Wed, 3 Jul 2019 08:25:25 +0000
> "Liu, Yi L" <yi.l.liu@...el.com> wrote:
> 
> > Hi Alex,
> >
> > Thanks for the comments. Have four inline responses below. And one
> > of them need your further help. :-)

[...]

> > > > >
> > > > > > > > > used iommu_attach_device() rather than iommu_attach_group()
> > > > > > > > > for non-aux mdev iommu_device.  Is there a requirement that
> > > > > > > > > the mdev parent device is in a singleton iommu group?
> > > > > > > >
> > > > > > > > I don't think there should have such limitation. Per my
> > > > > > > > understanding, vfio-mdev-pci should also be able to bind to
> > > > > > > > devices which shares iommu group with other devices. vfio-pci works
> well
> > > for such devices.
> > > > > > > > And since the two drivers share most of the codes, I think
> > > > > > > > vfio-mdev-pci should naturally support it as well.
> > > > > > >
> > > > > > > Yes, the difference though is that vfio.c knows when devices are
> > > > > > > in the same group, which mdev vfio.c only knows about the
> > > > > > > non-iommu backed group, not the group that is actually used for
> > > > > > > the iommu backing.  So we either need to enlighten vfio.c or
> > > > > > > further abstract those details in vfio_iommu_type1.c.
> > > > > >
> > > > > > Not sure if it is necessary to introduce more changes to vfio.c or
> > > > > > vfio_iommu_type1.c. If it's only for the scenario which two
> > > > > > devices share an iommu_group, I guess it could be supported by
> > > > > > using __iommu_attach_device() which has no device counting for the
> > > > > > group. But maybe I missed something here. It would be great if you
> > > > > > can elaborate a bit for it. :-)
> > > > >
> > > > > We need to use the group semantics, there's a reason
> > > > > __iommu_attach_device() is not exposed, it's an internal helper.  I
> > > > > think there's no way around that we need to somewhere track the
> > > > > actual group we're attaching to and have the smarts to re-use it for
> > > > > other devices in the same group.
> > > >
> > > > Hmmm, exposing __iommu_attach_device() is not good, let's forget it.
> > > > :-)
> > > >
> > > > > > > > > If this is a simplification, then vfio-mdev-pci should not
> > > > > > > > > bind to devices where this is violated since there's no way
> > > > > > > > > to use the device.  Can we support it though?
> > > > > > > >
> > > > > > > > yeah, I think we need to support it.
> >
> > I've already made vfio-mdev-pci driver work for non-singleton iommu
> > group. e.g. for devices in a single iommu group, I can bind the devices
> > to eithervfio-pci or vfio-mdev-pci and then passthru them to a VM. And
> > it will fail if user tries to passthru a vfio-mdev-pci device via vfio-pci
> > manner "-device vfio-pci,host=01:00.1". In other words, vfio-mdev-pci
> > device can only passthru via
> > "-device vfio-pci,sysfsdev=/sys/bus/mdev/devices/UUID". This is what
> > we expect.
> >
> > However, I encountered a problem when trying to prevent user from
> > passthru these devices to different VMs. I've tried in my side, and I
> > can passthru vfio-pci device and vfio-mdev-pci device to different
> > VMs. But actually this operation should be failed. If all the devices
> > are bound to vfio-pci, Qemu will open iommu backed group. So
> > Qemu can check if a given group has already been used by an
> > AddressSpace (a.ka. VM) in vfio_get_group() thus to prevent
> > user from passthru these devices to different VMs if the devices
> > are in the same iommu backed group. However, here for a
> > vfio-mdev-pci device, it has a new group and group ID, Qemu
> > will not be able to detect if the other devices (share iommu group
> > with vfio-mdev-pci device) are passthru to existing VMs. This is the
> > major problem for vfio-mdev-pci to support non-singleton group
> > in my side now. Even all devices are bound to vfio-mdev-pci driver,
> > Qemu is still unable to check since all the vfio-mdev-pci devices
> > have a separate mdev group.
> >
> > To fix it, may need Qemu to do more things. E.g. If it tries to use a
> > non-singleton iommu backed group, it needs to check if any mdev
> > group is created and used by an existing VM. Also it needs check if
> > iommu backed group is passthru to an existing VM when trying to
> > use a mdev group. For singleton iommu backed group and
> > aux-domain enabled physical device, still allow to passthru mdev
> > group to different VMs. To achieve these checks, Qemu may need
> > to have knowledge whether a group is iommu backed and singleton
> > or not. Do you think it is good to expose such info to userspace? or
> > any other idea? :-)
> 
> QEMU is never responsible for isolating a group, QEMU is just a
> userspace driver, it's vfio's responsibility to prevent the user from
> splitting groups in ways that are not allowed.  QEMU does not know the

yep, also my concern.

> true group association, it only knows the "virtual" group of the mdev
> device.  QEMU will create a container and add the mdev virtual group to
> the container.  In the kernel, the type1 backend should actually do an
> iommu_attach_group(), attaching the iommu_device group to the domain.
> When QEMU processes the next device, it will have a different group,
> but (assuming no vIOMMU) it will try to attach it to the same
> container, which should work because the iommu_device group backing the
> mdev virtual group is already attached to this domain.
> If we had two separate QEMU processes, each with an mdev device from a
> common group at the iommu_device level, the type1 backend should fail
> to attach the group to the container for the later caller.  I'd think
> this should fail at the iommu_attach_group() call since the group we're
> trying to attach is already attached to another domain.

Agree with you. At first, I want to fail it in similar way with vfio-pci devices.
For vfio-pci devices from a common group, vfio will fail the operation around
/dev/vfio/group_id open if user tries to assign the vfio-pci devices from common
group to multiple QEMU processes. Meanwhile, QEMU will avoid to open a
/dev/vfio/group_id multiple times, so current vfio/QEMU works well for 
non- singleton group (no vIOMMU). Unfortunately, looks like we have no way
to fail vfio-mdev-pci devices in similar mechanism as each mdev has a separate
group. So yes, I agree with you that we may fail it around the group attach
phase. Below is my draft idea:

In vfio_iommu_type1_attach_group(), we need to do the following checks.

if (mdev_group) {
	if (iommu_device group enabled aux-domain) {
		/*
		  * iommu_group enabled aux-domain means the iommu_devices
		  * in this group are aux-domain enabled. e.g. SIOV capable devices.
		  * Also, I think for aux-domain enabled group, it essentially means
		  * the group is a singleton group as SIOV capable devices require
		  * to be in a singleton group.
		  */
		 iommu_aux_attach_device();
	} else {
		/*
		  * needs to check the group->opened in vfio.c. Just like what
		  * vfio_group_fopen() does. May be a new VFIO interface required
		  * here since the group->opened is within vfio.c.
		  * vfio_iommu_device_group_opened_inc() will inc group->opened, so
		  * that other VM will fail when trying to open the group. And another
		  * VFIO interface is also required to dec group->opened when VM is
		  * down.
		  */
		if (vfio_iommu_device_group_opened_inc(iommu_device_group))
			return -EBUSY;
		iommu_attach_gorup(iommu_device_group);
	}
}

The concern here is the two new VFIO interfaces. Any thoughts on this proposal? :-)

> It's really unfortunate that we don't have the mdev inheriting the
> iommu group of the iommu_device so that userspace can really understand
> this relationship.  A separate group makes sense for the aux-domain
> case, and is (I guess) not a significant issue in the case of a
> singleton iommu_device group, but it's pretty awkward here.  Perhaps
> this is something we should correct in design of iommu backed mdevs.

Yeah, for aux-domain case, it is not significant issue as aux-domain essentially
means singleton iommu_devie group. And in early time, when designing the support
for wrap pci as a mdev, we also considered to let vfio-mdev-pci to reuse
iommu_device group. But this results in an iommu backed group includes mdev and
physical devices, which might also be strange. Do you think it is valuable to reconsider
it?

> Thanks,
> 
> Alex

Thanks,
Yi Liu