lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <BN9PR11MB54338D6126FE1D241F76E1948C029@BN9PR11MB5433.namprd11.prod.outlook.com>
Date:   Tue, 29 Jun 2021 00:43:08 +0000
From:   "Tian, Kevin" <kevin.tian@...el.com>
To:     Alex Williamson <alex.williamson@...hat.com>
CC:     Jason Gunthorpe <jgg@...dia.com>, Joerg Roedel <joro@...tes.org>,
        Jean-Philippe Brucker <jean-philippe@...aro.org>,
        David Gibson <david@...son.dropbear.id.au>,
        Jason Wang <jasowang@...hat.com>,
        "parav@...lanox.com" <parav@...lanox.com>,
        "Enrico Weigelt, metux IT consult" <lkml@...ux.net>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Shenming Lu <lushenming@...wei.com>,
        Eric Auger <eric.auger@...hat.com>,
        Jonathan Corbet <corbet@....net>,
        "Raj, Ashok" <ashok.raj@...el.com>,
        "Liu, Yi L" <yi.l.liu@...el.com>, "Wu, Hao" <hao.wu@...el.com>,
        "Jiang, Dave" <dave.jiang@...el.com>,
        Jacob Pan <jacob.jun.pan@...ux.intel.com>,
        "Kirti Wankhede" <kwankhede@...dia.com>,
        Robin Murphy <robin.murphy@....com>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "iommu@...ts.linux-foundation.org" <iommu@...ts.linux-foundation.org>,
        "David Woodhouse" <dwmw2@...radead.org>,
        LKML <linux-kernel@...r.kernel.org>,
        "Lu Baolu" <baolu.lu@...ux.intel.com>
Subject: RE: Plan for /dev/ioasid RFC v2

> From: Alex Williamson <alex.williamson@...hat.com>
> Sent: Tuesday, June 29, 2021 6:32 AM
> 
> On Mon, 28 Jun 2021 01:09:18 +0000
> "Tian, Kevin" <kevin.tian@...el.com> wrote:
> 
> > > From: Jason Gunthorpe <jgg@...dia.com>
> > > Sent: Friday, June 25, 2021 10:36 PM
> > >
> > > On Fri, Jun 25, 2021 at 10:27:18AM +0000, Tian, Kevin wrote:
> > >
> > > > -   When receiving the binding call for the 1st device in a group,
> iommu_fd
> > > >     calls iommu_group_set_block_dma(group, dev->driver) which does
> > > >     several things:
> > >
> > > The whole problem here is trying to match this new world where we want
> > > devices to be in charge of their own IOMMU configuration and the old
> > > world where groups are in charge.
> > >
> > > Inserting the group fd and then calling a device-centric
> > > VFIO_GROUP_GET_DEVICE_FD_NEW doesn't solve this conflict, and isn't
> > > necessary.
> >
> > No, this was not what I meant. There is no group fd required when
> > calling this device-centric interface. I was actually talking about:
> >
> > 	iommu_group_set_block_dma(dev->group, dev->driver)
> >
> > just because current iommu layer API is group-centric. Whether this
> > should be improved could be next-level thing. Sorry for not making
> > it clear in the first place.
> >
> > > We can always get the group back from the device at any
> > > point in the sequence do to a group wide operation.
> >
> > yes.
> >
> > >
> > > What I saw as the appeal of the sort of idea was to just completely
> > > leave all the difficult multi-device-group scenarios behind on the old
> > > group centric API and then we don't have to deal with them at all, or
> > > least not right away.
> >
> > yes, this is the staged approach that we discussed earlier. and
> > the reason why I refined this proposal about multi-devices group
> > here is because you want to see some confidence along this
> > direction. Thus I expanded your idea and hope to achieve consensus
> > with Alex/Joerg who obviously have not been convinced yet.
> >
> > >
> > > I'd see some progression where iommu_fd only works with 1:1 groups at
> > > the start. Other scenarios continue with the old API.
> >
> > One uAPI open after completing this new sketch. v1 proposed to
> > conduct binding (VFIO_BIND_IOMMU_FD) after device_fd is acquired.
> > With this sketch we need a new VFIO_GROUP_GET_DEVICE_FD_NEW
> > to complete both in one step. I want to get Alex's confirmation whether
> > it sounds good to him, since it's better to unify the uAPI between 1:1
> > group and 1:N group even if we don't support 1:N in the start.
> 
> I don't like it.  It doesn't make sense to me.  You have the
> group-centric world, which must continue to exist and cannot change
> because we cannot break the vfio uapi.  We can make extensions, we can
> define a new parallel uapi, we can deprecate the uapi, but in the short
> term, it can't change.
> 
> AIUI, the new device-centric model starts with vfio device files that
> can be opened directly.  So what then is the purpose of a *GROUP* get
> device fd?  Why is a vfio uapi involved in setting a device cookie for
> another subsystem?
> 
> I'd expect that /dev/iommu will be used by multiple subsystems.  All
> will want to bind devices to address spaces, so shouldn't binding a
> device to an iommufd be an ioctl on the iommufd, ie.
> IOMMU_BIND_VFIO_DEVICE_FD.  Maybe we don't even need "VFIO" in there
> and
> the iommufd code can figure it out internally.
> 
> You're essentially trying to reduce vfio to the device interface.  That
> necessarily implies that ioctls on the container, group, or passed
> through the container to the iommu no longer exist.  From my
> perspective, there should ideally be no new vfio ioctls.  The user gets
> a limited access vfio device fd and enables full access to the device
> by registering it to the iommufd subsystem (100% this needs to be
> enforced until close() to avoid revoke issues).  The user interacts
> exclusively with vfio via the device fd and performs all DMA address
> space related operations through the iommufd.
> 
> > > Then maybe groups where all devices use the same IOASID.
> > >
> > > Then 1:N groups if the source device is reliably identifiable, this
> > > requires iommu subystem work to attach domains to sub-group objects -
> > > not sure it is worthwhile.
> > >
> > > But at least we can talk about each step with well thought out patches
> > >
> > > The only thing that needs to be done to get the 1:1 step is to broadly
> > > define how the other two cases will work so we don't get into trouble
> > > and set some way to exclude the problematic cases from even getting to
> > > iommu_fd in the first place.
> > >
> > > For instance if we go ahead and create /dev/vfio/device nodes we could
> > > do this only if the group was 1:1, otherwise the group cdev has to be
> > > used, along with its API.
> >
> > I feel for VFIO possibly we don't need significant change to its uAPI
> > sequence, since it anyway needs to support existing semantics for
> > backward compatibility. With this sketch we can keep vfio container/
> > group by introducing an external iommu type which implies a different
> > GET_DEVICE_FD semantics. /dev/iommu can report a fd-wide capability
> > for whether 1:N group is supported to vfio user.
> 
> Ideally vfio would also at least be able to register a type1 IOMMU
> backend through the existing uapi, backed by this iommu code, ie. we'd
> create a new "iommufd" (but without the user visible fd), bind all the
> group devices to it, generating our own device cookies, create a single
> ioasid and attach all the devices to it (all internal).  When using the
> compatibility mode, userspace doesn't get device cookies, doesn't get
> an iommufd, they do mappings through the container, where vfio owns the
> cookies and ioasid.
> 
> > For new subsystems they can directly create device nodes and rely on
> > iommu fd to manage group isolation, without introducing any group
> > semantics in its uAPI.
> 
> Create device nodes, bind them to iommufd, associate cookies, attach
> ioasids, etc.  That should be the same for all subsystems, including
> vfio, it's just the magic internal handshake between the device
> subsystem and the iommufd subsystem that changes.
> 
> > > >         a) Check group viability. A group is viable only when all devices in
> > > >             the group are in one of below states:
> > > >
> > > >                 * driver-less
> > > >                 * bound to a driver which is same as dev->driver (vfio in this
> case)
> > > >                 * bound to an otherwise allowed driver (same list as in vfio)
> > >
> > > This really shouldn't use hardwired driver checks. Attached drivers
> > > should generically indicate to the iommu layer that they are safe for
> > > iommu_fd usage by calling some function around probe()
> >
> > good idea.
> >
> > >
> > > Thus a group must contain only iommu_fd safe drivers, or drivers-less
> > > devices before any of it can be used. It is the more general
> > > refactoring of what VFIO is doing.
> > >
> > > >         c) The iommu layer also verifies group viability on BUS_NOTIFY_
> > > >             BOUND_DRIVER event. BUG_ON if viability is broken while
> > > block_dma
> > > >             is set.
> > >
> > > And with this concept of iommu_fd safety being first-class maybe we
> > > can somehow eliminate this gross BUG_ON (and the 100's of lines of
> > > code that are used to create it) by denying probe to non-iommu-safe
> > > drivers, somehow.
> >
> > yes.
> >
> > >
> > > > -   Binding other devices in the group to iommu_fd just succeeds since
> > > >     the group is already in block_dma.
> > >
> > > I think the rest of this more or less describes the device centric
> > > logic for multi-device groups we've already talked about. I don't
> > > think it benifits from having the group fd
> > >
> >
> > sure. All of this new sketch doesn't have group fd in any iommu fd
> > API. Just try to elaborate a full sketch to sync the base.
> >
> > Alex/Joerg, look forward to your thoughts now. 😊
> 
> Some provided.  Thanks,
> 

Thanks a lot Alex! We'll try to focus on the new device-centric flow 
w/o touching existing container/group uAPI. As you said, we need
a brand-new mechanism for all subsystems anyway.

With that I will resume v2 progress based on device-centric concept.
It will be still based on a few new VFIO uAPIs to handle device binding/
attaching, though you prefer to not adding any new VFIO uAPI.  This is 
relatively a smaller open compared to device-centric vs. group-centric 
issue. We can have it continuously discussed in parallel with v2 review. 
and I hope v2 can be helpful for closing this open with a clearer 
explanation about PASID virtualization. 😊

Thanks
Kevin

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ