[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <fipxf3vf3nrbiqgwtu7z4vqcyt52dludehdvqc2cnfbal6poyv@uj4hxrlhnqeg>
Date: Wed, 8 Jan 2025 00:43:29 +0200
From: Dmitry Baryshkov <dmitry.baryshkov@...aro.org>
To: Rob Clark <robdclark@...il.com>
Cc: Will Deacon <will@...nel.org>, iommu@...ts.linux.dev,
linux-arm-msm@...r.kernel.org, freedreno@...ts.freedesktop.org,
Robin Murphy <robin.murphy@....com>, Rob Clark <robdclark@...omium.org>,
Joerg Roedel <joro@...tes.org>,
"moderated list:ARM SMMU DRIVERS" <linux-arm-kernel@...ts.infradead.org>, open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2] iommu/arm-smmu-qcom: Only enable stall on smmu-v2
On Tue, Jan 07, 2025 at 07:26:44AM -0800, Rob Clark wrote:
> On Tue, Jan 7, 2025 at 4:57 AM Will Deacon <will@...nel.org> wrote:
> >
> > On Thu, Jan 02, 2025 at 10:32:31AM -0800, Rob Clark wrote:
> > > From: Rob Clark <robdclark@...omium.org>
> > >
> > > On mmu-500, stall-on-fault seems to stall all context banks, causing the
> > > GMU to misbehave. So limit this feature to smmu-v2 for now.
> > >
> > > This fixes an issue with an older mesa bug taking outo the system
> > > because of GMU going off into the weeds.
> > >
> > > What we _think_ is happening is that, if the GPU generates 1000's of
> > > faults at ~once (which is something that GPUs can be good at), it can
> > > result in a sufficient number of stalled translations preventing other
> > > transactions from entering the same TBU.
> >
> > MMU-500 is an implementation of the SMMUv2 architecture, so this feels
> > upside-down to me. That is, it should always be valid to probe with
> > the less specific "SMMUv2" compatible string (modulo hardware errata)
> > and be limited to the architectural behaviour.
>
> I should have been more specific and referred to qcom,smmu-v2
>
> > So what is about MMU-500 that means stalling doesn't work when compared
> > to any other SMMUv2 implementation?
>
> Well, I have a limited # of data points, in the sense that there
> aren't too many a6xx devices prior to the switch to qcom,smmu-500..
> but I have access to crash metrics for a lot of sc7180 devices
> (qcom,smmu-v2), and I've been unable to find any signs of this sort of
> stall related issue.
>
> So maybe I can't 100% say this is qcom,smmu-500 vs qcom,smmu-v2, vs
> some other change in later gens that used qcom,smmu-500 or some other
> factor, I'm not sure what other conclusion to draw.
Might it be that v2 was an actual hw, but mmu-500 is somehow
virtualized? And as such by these stalls we might be observing some kind
of FW bug in hyp?
>
> BR,
> -R
--
With best wishes
Dmitry
Powered by blists - more mailing lists