[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e3034e4d-eabf-b8cc-b0be-916d1355edce@linux.intel.com>
Date: Wed, 31 May 2023 11:17:08 +0800
From: Baolu Lu <baolu.lu@...ux.intel.com>
To: Jason Gunthorpe <jgg@...pe.ca>
Cc: baolu.lu@...ux.intel.com, Kevin Tian <kevin.tian@...el.com>,
Joerg Roedel <joro@...tes.org>, Will Deacon <will@...nel.org>,
Robin Murphy <robin.murphy@....com>,
Jean-Philippe Brucker <jean-philippe@...aro.org>,
Nicolin Chen <nicolinc@...dia.com>,
Yi Liu <yi.l.liu@...el.com>,
Jacob Pan <jacob.jun.pan@...ux.intel.com>,
iommu@...ts.linux.dev, linux-kselftest@...r.kernel.org,
virtualization@...ts.linux-foundation.org,
linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCHES 00/17] IOMMUFD: Deliver IO page faults to user space
On 5/31/23 8:33 AM, Jason Gunthorpe wrote:
> On Tue, May 30, 2023 at 01:37:07PM +0800, Lu Baolu wrote:
>> Hi folks,
>>
>> This series implements the functionality of delivering IO page faults to
>> user space through the IOMMUFD framework. The use case is nested
>> translation, where modern IOMMU hardware supports two-stage translation
>> tables. The second-stage translation table is managed by the host VMM
>> while the first-stage translation table is owned by the user space.
>> Hence, any IO page fault that occurs on the first-stage page table
>> should be delivered to the user space and handled there. The user space
>> should respond the page fault handling result to the device top-down
>> through the IOMMUFD response uAPI.
>>
>> User space indicates its capablity of handling IO page faults by setting
>> a user HWPT allocation flag IOMMU_HWPT_ALLOC_FLAGS_IOPF_CAPABLE. IOMMUFD
>> will then setup its infrastructure for page fault delivery. Together
>> with the iopf-capable flag, user space should also provide an eventfd
>> where it will listen on any down-top page fault messages.
>>
>> On a successful return of the allocation of iopf-capable HWPT, a fault
>> fd will be returned. User space can open and read fault messages from it
>> once the eventfd is signaled.
>
> This is a performance path so we really need to think about this more,
> polling on an eventfd and then reading a different fd is not a good
> design.
>
> What I would like is to have a design from the start that fits into
> io_uring, so we can have pre-posted 'recvs' in io_uring that just get
> completed at high speed when PRIs come in.
>
> This suggests that the PRI should be delivered via read() on a single
> FD and pollability on the single FD without any eventfd.
Good suggestion. I will head in this direction.
>> Besides the overall design, I'd like to hear comments about below
>> designs:
>>
>> - The IOMMUFD fault message format. It is very similar to that in
>> uapi/linux/iommu which has been discussed before and partially used by
>> the IOMMU SVA implementation. I'd like to get more comments on the
>> format when it comes to IOMMUFD.
>
> We have to have the same discussion as always, does a generic fault
> message format make any sense here?
>
> PRI seems more likely that it would but it needs a big carefull cross
> vendor check out.
Yeah, good point.
As far as I can see, there are at least three types of IOPF hardware
implementation.
- PCI/PRI: Vendors might have their own additions. For example, VT-d 3.0
allows root-complex integrated endpoints to carry device specific
private data in their page requests. This has been removed from the
spec since v4.0.
- DMA stalls.
- Device-specific (non-PRI, not through IOMMU).
Does IOMMUFD want to support the last case?
Best regards,
baolu
Powered by blists - more mailing lists