[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231211152456.GB1489931@ziepe.ca>
Date: Mon, 11 Dec 2023 11:24:56 -0400
From: Jason Gunthorpe <jgg@...pe.ca>
To: Lu Baolu <baolu.lu@...ux.intel.com>
Cc: Joerg Roedel <joro@...tes.org>, Will Deacon <will@...nel.org>,
Robin Murphy <robin.murphy@....com>,
Kevin Tian <kevin.tian@...el.com>,
Jean-Philippe Brucker <jean-philippe@...aro.org>,
Nicolin Chen <nicolinc@...dia.com>,
Yi Liu <yi.l.liu@...el.com>,
Jacob Pan <jacob.jun.pan@...ux.intel.com>,
Longfang Liu <liulongfang@...wei.com>,
Yan Zhao <yan.y.zhao@...el.com>, iommu@...ts.linux.dev,
kvm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v8 12/12] iommu: Use refcount for fault data access
On Thu, Dec 07, 2023 at 02:43:08PM +0800, Lu Baolu wrote:
> @@ -217,12 +250,9 @@ int iommu_page_response(struct device *dev,
> if (!ops->page_response)
> return -ENODEV;
>
> - mutex_lock(¶m->lock);
> - fault_param = param->fault_param;
> - if (!fault_param) {
> - mutex_unlock(¶m->lock);
> + fault_param = iopf_get_dev_fault_param(dev);
> + if (!fault_param)
> return -EINVAL;
> - }
The refcounting should work by passing around the fault_param object,
not re-obtaining it from the dev from a work.
The work should be locked to the iommu_fault_param that was active
when the work was launched.
When we get to iommu_page_response it does this:
/* Only send response if there is a fault report pending */
mutex_lock(&fault_param->lock);
if (list_empty(&fault_param->faults)) {
dev_warn_ratelimited(dev, "no pending PRQ, drop response\n");
goto done_unlock;
}
Which determines that the iommu_fault_param is stale and pending
free..
Also iopf_queue_remove_device() is messed up - it returns an error
code but nothing ever does anything with it :( Remove functions like
this should never fail.
Removal should be like I explained earlier:
- Disable new PRI reception
- Ack all outstanding PRQ to the device
- Disable PRI on the device
- Tear down the iopf infrastructure
So under this model if the iopf_queue_remove_device() has been called
it should be sort of a 'disassociate' action where fault_param is
still floating out there but iommu_page_response() does nothing.
IOW pass the refcount from the iommu_report_device_fault() down into
the fault handler, into the work and then into iommu_page_response()
which will ultimately put it back.
> @@ -282,22 +313,15 @@ EXPORT_SYMBOL_GPL(iommu_page_response);
> */
> int iopf_queue_flush_dev(struct device *dev)
> {
> - int ret = 0;
> - struct iommu_fault_param *iopf_param;
> - struct dev_iommu *param = dev->iommu;
> + struct iommu_fault_param *iopf_param = iopf_get_dev_fault_param(dev);
>
> - if (!param)
> + if (!iopf_param)
> return -ENODEV;
And this also seems unnecessary, it is a bug to call this after
iopf_queue_remove_device() right? Just
rcu_derefernce(param->fault_param, true) and WARN_ON NULL.
Jason
Powered by blists - more mailing lists