linux-kernel - Re: [PATCH v2] iommu/arm-smmu-qcom: Only enable stall on smmu-v2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAF6AEGtNBoWnLA_dBBC-4U7DrWLO+hM_-9iraXgc45Aj885nCQ@mail.gmail.com>
Date: Mon, 6 Jan 2025 13:00:51 -0800
From: Rob Clark <robdclark@...il.com>
To: Akhil P Oommen <quic_akhilpo@...cinc.com>
Cc: iommu@...ts.linux.dev, linux-arm-msm@...r.kernel.org, 
	freedreno@...ts.freedesktop.org, Robin Murphy <robin.murphy@....com>, 
	Will Deacon <will@...nel.org>, Rob Clark <robdclark@...omium.org>, Joerg Roedel <joro@...tes.org>, 
	"moderated list:ARM SMMU DRIVERS" <linux-arm-kernel@...ts.infradead.org>, 
	open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2] iommu/arm-smmu-qcom: Only enable stall on smmu-v2

On Mon, Jan 6, 2025 at 12:11 PM Akhil P Oommen <quic_akhilpo@...cinc.com> wrote:
>
> On 1/3/2025 1:00 AM, Akhil P Oommen wrote:
> > On 1/3/2025 12:02 AM, Rob Clark wrote:
> >> From: Rob Clark <robdclark@...omium.org>
> >>
> >> On mmu-500, stall-on-fault seems to stall all context banks, causing the
> >> GMU to misbehave.  So limit this feature to smmu-v2 for now.
> >>
> >> This fixes an issue with an older mesa bug taking outo the system
> >> because of GMU going off into the weeds.
> >>
> >> What we _think_ is happening is that, if the GPU generates 1000's of
> >> faults at ~once (which is something that GPUs can be good at), it can
> >> result in a sufficient number of stalled translations preventing other
> >> transactions from entering the same TBU.
> >>
> >> Signed-off-by: Rob Clark <robdclark@...omium.org>
> >
> > Reviewed-by: Akhil P Oommen <quic_akhilpo@...cinc.com>
> >
>
> Btw, if stall is not enabled, I think there is no point in capturing
> coredump from adreno pagefault handler. By the time we start coredump,
> gpu might have switched context.

Hmm, we do at least capture ttbr0 both in fault info and from the
current submit, so it would at least be possible to tell if you are
looking at the wrong context.

BR,
-R