[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240806124943.GF676757@ziepe.ca>
Date: Tue, 6 Aug 2024 09:49:43 -0300
From: Jason Gunthorpe <jgg@...pe.ca>
To: Pranjal Shrivastava <praan@...gle.com>
Cc: Will Deacon <will@...nel.org>, Kunkun Jiang <jiangkunkun@...wei.com>,
Baolu Lu <baolu.lu@...ux.intel.com>,
Robin Murphy <robin.murphy@....com>, Joerg Roedel <joro@...tes.org>,
Nicolin Chen <nicolinc@...dia.com>,
Michael Shavit <mshavit@...gle.com>,
Mostafa Saleh <smostafa@...gle.com>,
"moderated list:ARM SMMU DRIVERS" <linux-arm-kernel@...ts.infradead.org>,
iommu@...ts.linux.dev, linux-kernel@...r.kernel.org,
wanghaibin.wang@...wei.com, yuzenghui@...wei.com,
tangnianyao@...wei.com
Subject: Re: [bug report] iommu/arm-smmu-v3: Event cannot be printed in some
scenarios
On Mon, Aug 05, 2024 at 03:32:50PM +0000, Pranjal Shrivastava wrote:
> Here's the updated diff:
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index a31460f9f3d4..ed2b106e02dd 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -1777,7 +1777,7 @@ static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt)
> goto out_unlock;
> }
>
> - iommu_report_device_fault(master->dev, &fault_evt);
> + ret = iommu_report_device_fault(master->dev, &fault_evt);
> out_unlock:
> mutex_unlock(&smmu->streams_mutex);
> return ret;
> diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> index 0e3a9b38bef2..7684e7562584 100644
> --- a/drivers/iommu/intel/svm.c
> +++ b/drivers/iommu/intel/svm.c
> @@ -532,6 +532,9 @@ void intel_svm_page_response(struct device *dev, struct iopf_fault *evt,
> bool last_page;
> u16 sid;
>
> + if (!evt)
> + return;
> +
I'm not sure this make sense??
The point of this path is for the driver to retire the fault with a
failure. This prevents that from happing on Intel and we are back to
loosing track of a fault.
All calls to iommu_report_device_fault() must result in
page_response() properly retiring whatever the event was.
> +static void iopf_error_response(struct device *dev, struct iommu_fault *fault)
> +{
> + const struct iommu_ops *ops = dev_iommu_ops(dev);
> + struct iommu_page_response resp = {
> + .pasid = fault->prm.pasid,
> + .grpid = fault->prm.grpid,
> + .code = IOMMU_PAGE_RESP_INVALID
> + };
> +
> + ops->page_response(dev, NULL, &resp);
> +}
The issue originates here, why is this NULL?
void iommu_report_device_fault(struct device *dev, struct iopf_fault *evt)
{
The caller has an evt? I think we should pass it down.
Looking at the abort_group path that is effectively what we do, but
the evt is copied to the group's evt first.
I also noticed we have another similar issue with the
report_partial_fault() loosing the fault if memory allocation
fails.. A goto for your new err label after report_partial_fault()
would be appropriate too
Jason
Powered by blists - more mailing lists