linux-kernel - Re: [PATCH v7 12/12] iommu: Improve iopf_queue_flush

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a0ef3a4f-88fc-40fe-9891-495d1b6b365b@linux.intel.com>
Date:   Sun, 3 Dec 2023 16:53:08 +0800
From:   Baolu Lu <baolu.lu@...ux.intel.com>
To:     Jason Gunthorpe <jgg@...pe.ca>
Cc:     baolu.lu@...ux.intel.com, Joerg Roedel <joro@...tes.org>,
        Will Deacon <will@...nel.org>,
        Robin Murphy <robin.murphy@....com>,
        Kevin Tian <kevin.tian@...el.com>,
        Jean-Philippe Brucker <jean-philippe@...aro.org>,
        Nicolin Chen <nicolinc@...dia.com>,
        Yi Liu <yi.l.liu@...el.com>,
        Jacob Pan <jacob.jun.pan@...ux.intel.com>,
        Yan Zhao <yan.y.zhao@...el.com>, iommu@...ts.linux.dev,
        kvm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v7 12/12] iommu: Improve iopf_queue_flush_dev()

On 12/2/23 4:35 AM, Jason Gunthorpe wrote:
> On Wed, Nov 15, 2023 at 11:02:26AM +0800, Lu Baolu wrote:
>> The iopf_queue_flush_dev() is called by the iommu driver before releasing
>> a PASID. It ensures that all pending faults for this PASID have been
>> handled or cancelled, and won't hit the address space that reuses this
>> PASID. The driver must make sure that no new fault is added to the queue.
> This needs more explanation, why should anyone care?
> 
> More importantly, why is*discarding*  the right thing to do?
> Especially why would we discard a partial page request group?
> 
> After we change a translation we may have PRI requests in a
> queue. They need to be acknowledged, not discarded. The DMA in the
> device should be restarted and the device should observe the new
> translation - if it is blocking then it should take a DMA error.
> 
> More broadly, we should just let things run their normal course. The
> domain to deliver the fault to should be determined very early. If we
> get a fault and there is no fault domain currently assigned then just
> restart it.
> 
> The main reason to fence would be to allow the domain to become freed
> as the faults should be holding pointers to it. But I feel there are
> simpler options for that then this..

In the iommu_detach_device_pasid() path, the domain is about to be
removed from the pasid of device. The IOMMU driver performs the
following steps sequentially:

1. Clears the pasid translation entry. Thus, all subsequent DMA
    transactions (translation requests, translated requests or page
    requests) targeting the iommu domain will be blocked.

2. Waits until all pending page requests for the device's PASID have
    been reported to upper layers via the iommu_report_device_fault().
    However, this does not guarantee that all page requests have been
    responded.

3. Free all partial page requests for this pasid since the page request
    response is only needed for a complete request group. There's no
    action required for the page requests which are not last of a request
    group.

4. Iterate through the list of pending page requests and identifies
    those originating from the device's PASID. For each identified
    request, the driver responds to the hardware with the
    IOMMU_PAGE_RESP_INVALID code, indicating that the request cannot be
    handled and retries should not be attempted. This response code
    corresponds to the "Invalid Request" status defined in the PCI PRI
    specification.

5. Follow the IOMMU hardware requirements (for example, VT-d sepc,
    section 7.10, Software Steps to Drain Page Requests & Responses) to
    drain in-flight page requests and page group responses between the
    remapping hardware queues and the endpoint device.

With above steps done in iommu_detach_device_pasid(), the pasid could be
re-used for any other address space.

The iopf_queue_discard_dev_pasid() helper does step 3 and 4.

> 
>> The SMMUv3 driver doesn't use it because it only implements the
>> Arm-specific stall fault model where DMA transactions are held in the SMMU
>> while waiting for the OS to handle iopf's. Since a device driver must
>> complete all DMA transactions before detaching domain, there are no
>> pending iopf's with the stall model. PRI support requires adding a call to
>> iopf_queue_flush_dev() after flushing the hardware page fault queue.
> This explanation doesn't make much sense, from a device driver
> perspective both PRI and stall cause the device to not complete DMAs.
> 
> The difference between stall and PRI is fairly small, stall causes an
> internal bus to lock up while PRI does not.
> 
>> -int iopf_queue_flush_dev(struct device *dev)
>> +int iopf_queue_discard_dev_pasid(struct device *dev, ioasid_t pasid)
>>   {
>>   	struct iommu_fault_param *iopf_param = iopf_get_dev_fault_param(dev);
>> +	const struct iommu_ops *ops = dev_iommu_ops(dev);
>> +	struct iommu_page_response resp;
>> +	struct iopf_fault *iopf, *next;
>> +	int ret = 0;
>>   
>>   	if (!iopf_param)
>>   		return -ENODEV;
>>   
>>   	flush_workqueue(iopf_param->queue->wq);
>> +
> A naked flush_workqueue like this is really suspicious, it needs a
> comment explaining why the queue can't get more work queued at this
> point.
> 
> I suppose the driver is expected to stop calling
> iommu_report_device_fault() before calling this function, but that
> doesn't seem like it is going to be possible. Drivers should be
> implementing atomic replace for the PASID updates and in that case
> there is no momement when it can say the HW will stop generating PRI.

Atomic domain replacement for a PASID is not currently implemented in
the core or driver. Even if atomic replacement were to be implemented,
it would be necessary to ensure that all translation requests,
translated requests, page requests and responses for the old domain are
drained before switching to the new domain. I am not sure whether the
existing iommu hardware architecture supports this functionality.

Best regards,
baolu