[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZrJIM8-pS31grIVR@google.com>
Date: Tue, 6 Aug 2024 15:58:43 +0000
From: Pranjal Shrivastava <praan@...gle.com>
To: Jason Gunthorpe <jgg@...pe.ca>
Cc: Will Deacon <will@...nel.org>, Kunkun Jiang <jiangkunkun@...wei.com>,
Baolu Lu <baolu.lu@...ux.intel.com>,
Robin Murphy <robin.murphy@....com>, Joerg Roedel <joro@...tes.org>,
Nicolin Chen <nicolinc@...dia.com>,
Michael Shavit <mshavit@...gle.com>,
Mostafa Saleh <smostafa@...gle.com>,
"moderated list:ARM SMMU DRIVERS" <linux-arm-kernel@...ts.infradead.org>,
iommu@...ts.linux.dev, linux-kernel@...r.kernel.org,
wanghaibin.wang@...wei.com, yuzenghui@...wei.com,
tangnianyao@...wei.com
Subject: Re: [bug report] iommu/arm-smmu-v3: Event cannot be printed in some
scenarios
On Tue, Aug 06, 2024 at 09:49:43AM -0300, Jason Gunthorpe wrote:
> On Mon, Aug 05, 2024 at 03:32:50PM +0000, Pranjal Shrivastava wrote:
> > Here's the updated diff:
> > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > index a31460f9f3d4..ed2b106e02dd 100644
> > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > @@ -1777,7 +1777,7 @@ static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt)
> > goto out_unlock;
> > }
> >
> > - iommu_report_device_fault(master->dev, &fault_evt);
> > + ret = iommu_report_device_fault(master->dev, &fault_evt);
> > out_unlock:
> > mutex_unlock(&smmu->streams_mutex);
> > return ret;
> > diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> > index 0e3a9b38bef2..7684e7562584 100644
> > --- a/drivers/iommu/intel/svm.c
> > +++ b/drivers/iommu/intel/svm.c
> > @@ -532,6 +532,9 @@ void intel_svm_page_response(struct device *dev, struct iopf_fault *evt,
> > bool last_page;
> > u16 sid;
> >
> > + if (!evt)
> > + return;
> > +
>
> I'm not sure this make sense??
>
> The point of this path is for the driver to retire the fault with a
> failure. This prevents that from happing on Intel and we are back to
> loosing track of a fault.
>
> All calls to iommu_report_device_fault() must result in
> page_response() properly retiring whatever the event was.
>
> > +static void iopf_error_response(struct device *dev, struct iommu_fault *fault)
> > +{
> > + const struct iommu_ops *ops = dev_iommu_ops(dev);
> > + struct iommu_page_response resp = {
> > + .pasid = fault->prm.pasid,
> > + .grpid = fault->prm.grpid,
> > + .code = IOMMU_PAGE_RESP_INVALID
> > + };
> > +
> > + ops->page_response(dev, NULL, &resp);
> > +}
>
> The issue originates here, why is this NULL?
>
> void iommu_report_device_fault(struct device *dev, struct iopf_fault *evt)
> {
>
> The caller has an evt? I think we should pass it down.
Hmm, I agree, I don't see `iommu_report_device_fault` be called anywhere
with a NULL evt. Hence, it does make sense to pass the evt down and
ensure we don't lose track of the event.
I'm assuming that we retired the if (!evt) check from intel->page
response since we didn't have any callers of intel->page_response
with a NULL evt. (Atleast, for now, I don't see that happen).
Lu, Will -- Any additional comments/suggestions for this?
>
> Looking at the abort_group path that is effectively what we do, but
> the evt is copied to the group's evt first.
>
> I also noticed we have another similar issue with the
> report_partial_fault() loosing the fault if memory allocation
> fails.. A goto for your new err label after report_partial_fault()
> would be appropriate too
Ahh, yes! I'll add that too in the follow up.
>
> Jason
Thanks,
Pranjal
Powered by blists - more mailing lists