[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <93a57e63-352c-407c-ac3f-4b91c11d925d@linux.intel.com>
Date: Mon, 4 Dec 2023 11:46:30 +0800
From: Baolu Lu <baolu.lu@...ux.intel.com>
To: Jason Gunthorpe <jgg@...pe.ca>
Cc: baolu.lu@...ux.intel.com, Joerg Roedel <joro@...tes.org>,
Will Deacon <will@...nel.org>,
Robin Murphy <robin.murphy@....com>,
Kevin Tian <kevin.tian@...el.com>,
Jean-Philippe Brucker <jean-philippe@...aro.org>,
Nicolin Chen <nicolinc@...dia.com>,
Yi Liu <yi.l.liu@...el.com>,
Jacob Pan <jacob.jun.pan@...ux.intel.com>,
Yan Zhao <yan.y.zhao@...el.com>, iommu@...ts.linux.dev,
kvm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v7 12/12] iommu: Improve iopf_queue_flush_dev()
On 12/2/23 4:35 AM, Jason Gunthorpe wrote:
> I'm looking at this code after these patches are applied and it still
> seems quite bonkers to me 🙁
>
> Why do we allocate two copies of the memory on all fault paths?
>
> Why do we have fault->type still that only has one value?
>
> What is serializing iommu_get_domain_for_dev_pasid() in the fault
> path? It looks sort of like the plan is to use iopf_param->lock and
> ensure domain removal grabs that lock at least after the xarray is
> changed - but does that actually happen?
>
> I would suggest, broadly, a flow for iommu_report_device_fault() sort
> of:
>
> 1) Allocate memory for the evt. Every path except errors needs this,
> so just do it
> 2) iopf_get_dev_fault_param() should not have locks in it! This is
> fast path now. Use a refcount, atomic compare exchange to allocate,
> and RCU free.
> 3) Everything runs under the fault_param->lock
> 4) Check if !IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE, set it aside and then
> exit! This logic is really tortured and confusing
> 5) Allocate memory and assemble the group
> 6) Obtain the domain for this group and incr a per-domain counter that a
> fault is pending on that domain
> 7) Put the*group* into the WQ. Put the*group* on a list in fault_param
> instead of the individual faults
> 8) Don't linear search a linked list in iommu_page_response()! Pass
> the group in that we got from the WQ that we*know* is still
> active. Ack that passed group.
>
> When freeing a domain wait for the per-domain counter to go to
> zero. This ensures that the WQ is flushed out and all the outside
> domain references are gone.
>
> When wanting to turn off PRI make sure a non-PRI domain is
> attached to everything. Fence against the HW's event queue. No new
> iommu_report_device_fault() is possible.
>
> Lock the fault_param->lock and go through every pending group and
> respond it. Mark the group memory as invalid so iommu_page_response()
> NOP's it. Unlock, fence the HW against queued responses, and turn off
> PRI.
>
> An*optimization* would be to lightly flush the domain when changing
> the translation. Lock the fault_param->lock and look for groups in the
> list with old_domain. Do the same as for PRI-off: respond to the
> group, mark it as NOP. The WQ may still be chewing on something so the
> domain free still has to check and wait.
Very appreciated for all the ideas. I looked through the items and felt
that all these are good optimizations.
I am wondering whether we can take patch 1/12 ~ 10/12 of this series as
a first step, a refactoring effort to support delivering iopf to
userspace? I will follow up with one or multiple series to add the
optimizations.
Does this work for you? Or, you want to take any of above as the
requirement for iommufd use case?
Best regards,
baolu
Powered by blists - more mailing lists