[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aKxpyyKvYcd84Ayi@google.com>
Date: Mon, 25 Aug 2025 13:48:59 +0000
From: Pranjal Shrivastava <praan@...gle.com>
To: Mostafa Saleh <smostafa@...gle.com>
Cc: Eric Auger <eric.auger@...hat.com>,
Alex Williamson <alex.williamson@...hat.com>, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org, clg@...hat.com
Subject: Re: [PATCH 2/2] vfio/platform: Mark for removal
On Wed, Aug 20, 2025 at 08:25:47PM +0000, Mostafa Saleh wrote:
> Hi Eric,
>
> On Wed, Aug 20, 2025 at 06:29:27PM +0200, Eric Auger wrote:
> > Hi Mostafa,
> >
> > On 8/20/25 5:20 PM, Mostafa Saleh wrote:
> > > Hi Eric,
> > >
> > > On Tue, Aug 19, 2025 at 11:58:32AM +0200, Eric Auger wrote:
> > >> Hi Mostafa,
> > >>
> > >> On 8/18/25 7:33 PM, Mostafa Saleh wrote:
> > >>> On Mon, Aug 18, 2025 at 10:52:42AM -0600, Alex Williamson wrote:
> > >>>> On Fri, 15 Aug 2025 16:59:37 +0000
> > >>>> Mostafa Saleh <smostafa@...gle.com> wrote:
> > >>>>
> > >>>>> Hi Alex,
> > >>>>>
> > >>>>> On Wed, Aug 06, 2025 at 11:03:12AM -0600, Alex Williamson wrote:
> > >>>>>> vfio-platform hasn't had a meaningful contribution in years. In-tree
> > >>>>>> hardware support is predominantly only for devices which are long since
> > >>>>>> e-waste. QEMU support for platform devices is slated for removal in
> > >>>>>> QEMU-10.2. Eric Auger presented on the future of the vfio-platform
> > >>>>>> driver and difficulties supporting new devices at KVM Forum 2024,
> > >>>>>> gaining some support for removal, some disagreement, but garnering no
> > >>>>>> new hardware support, leaving the driver in a state where it cannot
> > >>>>>> be tested.
> > >>>>>>
> > >>>>>> Mark as obsolete and subject to removal.
> > >>>>> Recently(this year) in Android, we enabled VFIO-platform for protected KVM,
> > >>>>> and it’s supported in our VMM (CrosVM) [1].
> > >>>>> CrosVM support is different from Qemu, as it doesn't require any device
> > >>>>> specific logic in the VMM, however, it relies on loading a device tree
> > >>>>> template in runtime (with “compatiable” string...) and it will just
> > >>>>> override regs, irqs.. So it doesn’t need device knowledge (at least for now)
> > >>>>> Similarly, the kernel doesn’t need reset drivers as the hypervisor handles that.
> > >>>> I think what we attempt to achieve in vfio is repeatability and data
> > >>>> integrity independent of the hypervisor. IOW, if we 'kill -9' the
> > >>>> hypervisor process, the kernel can bring the device back to a default
> > >>>> state where the device isn't wedged or leaking information through the
> > >>>> device to the next use case. If the hypervisor wants to support
> > >>>> enhanced resets on top of that, that's great, but I think it becomes
> > >>>> difficult to argue that vfio-platform itself holds up its end of the
> > >>>> bargain if we're really trusting the hypervisor to handle these aspects.
> > >>> Sorry I was not clear, we only use that in Android for ARM64 and pKVM,
> > >>> where the hypervisor in this context means the code running in EL2 which
> > >>> is more privileged than the kernel, so it should be trusted.
> > >>> However, as I mentioned that code is not upstream yet, so it's a valid
> > >>> concern that the kernel still needs a reset driver.
> > >>>
> > >>>>> Unfortunately, there is no upstream support at the moment, we are making
> > >>>>> some -slow- progress on that [2][3]
> > >>>>>
> > >>>>> If it helps, I have access to HW that can run that and I can review/test
> > >>>>> changes, until upstream support lands; if you are open to keeping VFIO-platform.
> > >>>>> Or I can look into adding support for existing upstream HW(with platforms I am
> > >>>>> familiar with as Pixel-6)
> > >>>> Ultimately I'll lean on Eric to make the call. I know he's concerned
> > >>>> about testing, but he raised that and various other concerns whether
> > >>>> platform device really have a future with vfio nearly a year ago and
> > >>>> nothing has changed. Currently it requires a module option opt-in to
> > >>>> enable devices that the kernel doesn't know how to reset. Is that
> > >>>> sufficient or should use of such a device taint the kernel? If any
> > >>>> device beyond the few e-waste devices that we know how to reset taint
> > >>>> the kernel, should this support really even be in the kernel? Thanks,
> > >>> I think with the way it’s supported at the moment we need the kernel
> > >>> to ensure that reset happens.
> > >> Effectively my main concern is I cannot test vfio-platform anymore. We
> > >> had some CVEs also impacting the vfio platform code base and it is a
> > >> major issue not being able to test. That's why I was obliged, last year,
> > >> to resume the integration of a new device (the tegra234 mgbe), nobody
> > >> seemed to be really interested in and this work could not be upstreamed
> > >> due to lack of traction and its hacky nature.
> > >>
> > >> You did not really comment on which kind of devices were currently
> > >> integrated. Are they within the original scope of vfio (with DMA
> > >> capabilities and protected by an IOMMU)? Last discussion we had in
> > >> https://lore.kernel.org/all/ZvvLpLUZnj-Z_tEs@google.com/ led to the
> > >> conclusion that maybe VFIO was not the best suited framework.
> > > At the moment, Android device assignement only supports DMA capable
> > > devices which are behind an IOMMU, and we use VFIO-platform for that,
> > > most of our use cases are accelerators.
> > >
> > > In that thread, I was looking into adding support for simpler devices
> > > (such as sensors) but as discussed that won’t be done through
> > > VFIO-platform.
> > >
> > > Ignoring Android, as I mentioned, I can work on adding support for
> > > existing upstream platforms (preferably ARM64, that I can get access to)
> > > such as Pixel-6, which should make it easier to test.
> > >
> > > Also, we have some interest on adding new features such as run-time
> > > power management.
> >
> > OK fair enough. If Alex agrees then we can wait for those efforts. Also
> > I think it would make sense to formalize the way you reset the devices
> > (I understand the hyp does that under the hood).
>
> I think currently - with some help from the platform bus- we can rely on
> the existing shutdown method, instead of specific hooks.
> As the hypervisor logic will only be for ARM64 (at least for now), I can
> look more into this.
>
> But I think the top priority would be to establish a decent platform to
> test with, I will start looking into Pixel-6 (although that would need
> to land IOMMU support for it upstream first). I also have a morello
> board with SMMUv3, but I think it's all PCI.
>
> > >
> > >> In case we keep the driver in, I think we need to get a garantee that
> > >> you or someone else at Google commits to review and test potential
> > >> changes with a perspective to take over its maintenance.
> > > I can’t make guarantees on behalf of Google, but I can contribute in
> > > reviewing/testing/maintenance of the driver as far as I am able to.
> > > If you want, you can add me as reviewer to the driver.
> >
> > I understand. I think the usual way then is for you to send a patch to
> > update the Maintainers file.
>
> I see, I will send one shortly.
>
I could contribute time to help with the maintenance effort here, if
needed. Please let me know if you'd like that.
> Thanks,
> Mostafa
>
Thanks,
Praan
> >
> > Thanks
> >
> > Eric
> > >
> > > Thanks,
> > > Mostafa
> > >
> > >
> > >> Thanks
> > >>
> > >> Eric
> > >>
> > >>> But maybe instead of having that specific reset handler for VFIO, we
> > >>> can rely on the “shutdown” method already existing in "platform_driver"?
> > >>> I believe that should put the device in a state where it can be re-probed
> > >>> safely. Although not all devices implement that but it seems more generic
> > >>> and scalable.
> > >>>
> > >>> Thanks,
> > >>> Mostafa
> > >>>
> > >>>> Alex
> > >>>>
> >
>
Powered by blists - more mailing lists