linux-kernel - Re: [RFC PATCHES 00/17] IOMMUFD: Deliver IO page faults to user space

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <e3034e4d-eabf-b8cc-b0be-916d1355edce@linux.intel.com>
Date:   Wed, 31 May 2023 11:17:08 +0800
From:   Baolu Lu <baolu.lu@...ux.intel.com>
To:     Jason Gunthorpe <jgg@...pe.ca>
Cc:     baolu.lu@...ux.intel.com, Kevin Tian <kevin.tian@...el.com>,
        Joerg Roedel <joro@...tes.org>, Will Deacon <will@...nel.org>,
        Robin Murphy <robin.murphy@....com>,
        Jean-Philippe Brucker <jean-philippe@...aro.org>,
        Nicolin Chen <nicolinc@...dia.com>,
        Yi Liu <yi.l.liu@...el.com>,
        Jacob Pan <jacob.jun.pan@...ux.intel.com>,
        iommu@...ts.linux.dev, linux-kselftest@...r.kernel.org,
        virtualization@...ts.linux-foundation.org,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCHES 00/17] IOMMUFD: Deliver IO page faults to user space

On 5/31/23 8:33 AM, Jason Gunthorpe wrote:
> On Tue, May 30, 2023 at 01:37:07PM +0800, Lu Baolu wrote:
>> Hi folks,
>>
>> This series implements the functionality of delivering IO page faults to
>> user space through the IOMMUFD framework. The use case is nested
>> translation, where modern IOMMU hardware supports two-stage translation
>> tables. The second-stage translation table is managed by the host VMM
>> while the first-stage translation table is owned by the user space.
>> Hence, any IO page fault that occurs on the first-stage page table
>> should be delivered to the user space and handled there. The user space
>> should respond the page fault handling result to the device top-down
>> through the IOMMUFD response uAPI.
>>
>> User space indicates its capablity of handling IO page faults by setting
>> a user HWPT allocation flag IOMMU_HWPT_ALLOC_FLAGS_IOPF_CAPABLE. IOMMUFD
>> will then setup its infrastructure for page fault delivery. Together
>> with the iopf-capable flag, user space should also provide an eventfd
>> where it will listen on any down-top page fault messages.
>>
>> On a successful return of the allocation of iopf-capable HWPT, a fault
>> fd will be returned. User space can open and read fault messages from it
>> once the eventfd is signaled.
> 
> This is a performance path so we really need to think about this more,
> polling on an eventfd and then reading a different fd is not a good
> design.
> 
> What I would like is to have a design from the start that fits into
> io_uring, so we can have pre-posted 'recvs' in io_uring that just get
> completed at high speed when PRIs come in.
> 
> This suggests that the PRI should be delivered via read() on a single
> FD and pollability on the single FD without any eventfd.

Good suggestion. I will head in this direction.

>> Besides the overall design, I'd like to hear comments about below
>> designs:
>>
>> - The IOMMUFD fault message format. It is very similar to that in
>>    uapi/linux/iommu which has been discussed before and partially used by
>>    the IOMMU SVA implementation. I'd like to get more comments on the
>>    format when it comes to IOMMUFD.
> 
> We have to have the same discussion as always, does a generic fault
> message format make any sense here?
> 
> PRI seems more likely that it would but it needs a big carefull cross
> vendor check out.

Yeah, good point.

As far as I can see, there are at least three types of IOPF hardware
implementation.

- PCI/PRI: Vendors might have their own additions. For example, VT-d 3.0
   allows root-complex integrated endpoints to carry device specific
   private data in their page requests. This has been removed from the
   spec since v4.0.

- DMA stalls.

- Device-specific (non-PRI, not through IOMMU).

Does IOMMUFD want to support the last case?

Best regards,
baolu