[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250529043125.30478-1-lizhe.67@bytedance.com>
Date: Thu, 29 May 2025 12:31:25 +0800
From: lizhe.67@...edance.com
To: alex.williamson@...hat.com,
jgg@...pe.ca
Cc: david@...hat.com,
kvm@...r.kernel.org,
linux-kernel@...r.kernel.org,
lizhe.67@...edance.com,
muchun.song@...ux.dev
Subject: Re: [PATCH v3] vfio/type1: optimize vfio_pin_pages_remote() for huge folio
On Wed, 28 May 2025 14:09:41 -0600, alex.williamson@...hat.com wrote:
> On Tue, 27 May 2025 20:46:27 -0300
> Jason Gunthorpe <jgg@...pe.ca> wrote:
>
> > On Tue, May 27, 2025 at 01:52:52PM -0600, Alex Williamson wrote:
> >
> > > > Lots of CSPs are running iommufd now. There is a commonly used OOT
> > > > patch to add the insecure P2P support like VFIO. I know lots of folks
> > > > have backported iommufd.. No idea about libvirt, but you can run it in
> > > > compatibility mode and then you don't need to change libvirt.
> > >
> > > For distributions that don't have an upstream first policy, sure, they
> > > can patch whatever they like. I can't recommend that solution though.
> >
> > I appreciate that, and we are working on it.. The first round of
> > patches for DMA API improvements that Christoph asked for were sent as
> > a PR yesterday.
> >
> > > Otherwise the problem with compatibility mode is that it's a compile
> > > time choice.
> >
> > The compile time choice is not the compatability mode.
> >
> > Any iommufd, even if opened from /dev/iommu, is usable as a VFIO
> > container in the classic group based ioctls.
> >
> > The group path in VFIO calls vfio_group_ioctl_set_container() ->
> > iommufd_ctx_from_file() which works with iommufd from any source.
> >
> > The type 1 emulation ioctls are also always available on any iommufd.
> > After set container VFIO does iommufd_vfio_compat_ioas_create() to
> > setup the default compatability stuff.
> >
> > All the compile time option does is replace /dev/vfio/vfio with
> > /dev/iommu, but they have *exactly* the same fops:
> >
> > static struct miscdevice iommu_misc_dev = {
> > .minor = MISC_DYNAMIC_MINOR,
> > .name = "iommu",
> > .fops = &iommufd_fops,
> >
> > static struct miscdevice vfio_misc_dev = {
> > .minor = VFIO_MINOR,
> > .name = "vfio",
> > .fops = &iommufd_fops,
> >
> > So you can have libvirt open /dev/iommu, or you can have the admin
> > symlink /dev/iommu to /dev/vfio/vfio and opt in on a case by case
> > basis.
>
> Yes, I'd forgotten we added this. It's QEMU opening /dev/vfio/vfio and
> QEMU already has native iommufd support, so a per VM hack could be done
> via qemu:args or a QEMU wrapper script to instantiate an iommufd object
> in the VM xml, or as noted, a system-wide change could be done
> transparently via 'ln -sf /dev/iommu /dev/vfio/vfio'.
>
> To be fair to libvirt, we'd really like libvirt to make use of iommufd
> whenever it's available, but without feature parity this would break
> users. And without feature parity, it's not clear how libvirt should
> probe for feature parity. Things get a lot easier for libvirt if we
> can switch the default at a point where we expect no regressions.
>
> > The compile time choice is really just a way to make testing easier
> > and down the road if a distro decides they don't want to support both
> > code bases then can choose to disable the type 1 code entirely and
> > still be uAPI compatible, but I think that is down the road a ways
> > still.
>
> Yep.
>
> > > A single kernel binary cannot interchangeably provide
> > > either P2P DMA with legacy vfio or better IOMMUFD improvements without
> > > P2P DMA.
> >
> > See above, it can, and it was deliberately made easy to do without
> > having to change any applications.
> >
> > The idea was you can sort of incrementally decide which things to move
> > over. For instance you can keep all the type 1 code and vfio
> > group/container stuff unchanged but use a combination of
> > IOMMU_VFIO_IOAS_GET and then IOMMUFD_CMD_IOAS_MAP_FILE to map a memfd.
>
> Right, sorry, it'd slipped my mind that we'd created the "soft"
> compatibility mode too.
>
> Zhe, so if you have no dependencies on P2P DMA within your device
> assignment VMs, the options above may be useful or at least a data
> point for comparison of type1 vs IOMMUFD performance. Thanks,
>
> Alex
Hi Alex, Jason, thank a lot for your suggestions. I will try to explore
the usage of iommufd in our scenario. I also hope that the P2P DMA patches
will be merged into the mainline soon.
Thanks,
Zhe
Powered by blists - more mailing lists